Predictor importance

Some types of machine learning models, such as Random Trees, include methods for producing predictor importance measures. For others, such as regression models like linear and logistic regression, measures of predictor importance are not built into the algorithms. The IBM SPSS Spark Machine Learning Library features a separate PredictorImportance option that can be applied after fitting these models. To do this, you would replace the code in Step 2 in the example with code (for the linear regression example, where the model is named linearRegressionModel and the data frame is named data):


val linearRegressionPMML = linearRegressionModel.toPMML()

import com.ibm.spss.ml.utils.PredictorImportance
val pi = PredictorImportance(linearRegressionPMML)
val piModel = pi.fit(data)
val piPMML =  piModel.toPMML()

import com.ibm.spss.scala.ModelViewer
val html = ModelViewer.toHTML(pc,piPMML,Option(linearRegressionModel.statXML))
kernel.magics.html(html)

The three sections of code perform the following steps:

  1. The first line creates a PMML object linearRegressionPMML containing the PMML output from the linear regression model.
  2. The middle block of code imports the Predictor Importance function, applies it to the data frame data and the existing PMML file linearRegressionPMML and produces a new PMML object containing the predictor importance values in addition to the information from the linear regression.
  3. The last section calls the ModelViewer method, specifying use of the new PMML file and the original statXML file that was automatically produced in running the linear regression model.