Native Python APIs (SPSS Modeler) | IBM Data Product Exchange

Native Python APIs

Last updated: Oct 09, 2024

Native Python APIs (SPSS Modeler)

You can invoke native Python APIs from your scripts to interact with SPSS Modeler.

The following APIs are supported.

To see an example, you can download the stream available here and import it into SPSS Modeler (from your project, click New asset, select SPSS Modeler, then select Local file). Then open the Extension node properties in the flow to see example syntax.

APIs for data models

modelerpy.isComputeDataModelOnly()
You can use this API to check whether a current run is to compute the output data or only compute the output data model. When it returns true, your script must not perform any task that depends on input or output data, otherwise the run will fail.
modelerpy.getDataModel()
This API contacts SPSS Modeler to get the data model for an input dataset. The return value is an instance of class DataModel which describes metadata of the input dataset, including field count, field name, field storage type, etc.
modelerpy.setOutputDataModel(dataModel)
This API sends an instance of class DataModel back to SPSS Modeler, and must be invoked before your script passes a dataset to SPSS Modeler. SPSS Modeler will use the metadata described in this DataModel instance to handle your data on the SPSS Modeler side.

APIs for modeling

modelerpy.saveModel(model, name='model', compress=False)
This API transforms a Python model into an SPSS Modeler model, which is then saved by SPSS Modeler. You should invoke this API from a modeling node when a Python model is built. After invoking this API, the saved model is copied to a generated model nugget.
modelerpy.loadModel(name='model')
This API loads an SPSS Modeler saved model and creates a Python object for the saved model. Invoke this API from the model nugget to load the saved model for further processing, such as scoring.

APIs for input/output datasets

modelerpy.readPandasDataframe()
This API reads a dataset from SPSS Modeler to Python. The return value is a Python Pandas DataFrame (a two-dimensional data structure, like a two-dimensional array, or a table with rows and columns).
modelerpy.writePandasDataframe(df)
This API writes a Python Pandas DataFrame from Python to SPSS Modeler.

APIs for packages

modelerpy.installPackage(package)
This API pulls a package from pypi.org and installs it.
modelerpy.uninstallPackage(package)
This API uninstalls an installed package.

APIs for metadata

The following metadata-related classes should be used with modelerpy.getDataModel and modelerpy.setOutputDataModel.

modelerpy.DataModel
This API is the main entry class for the metadata. It contains an array of instances of class Field and includes the following methods
- modelerpy.DataModel.getFields
  This method returns the array of class Field instances.
- modelerpy.DataModel.addField
  This method adds an instance of Field to the metadata array.
- modelerpy.Field
  The Field class is where the actual metadata info is stored, including the field name, storage, and measurement,
- modelerpy.Field.getName
  This method returns the name of the field.
- modelerpy.Field.getStorage
  This method returns the storage of the field. Valid storage includes: integer, real, string, date, time, and timestamp.
- modelerpy.Field.getMeasure
  This method returns the measurement of the field. Valid measurements include: discrete, flag, nominal, ordinal, and continuous.

The following example code constructs a DataModel object by invoking the modelerpy.DataModel constructor with an array of modelerpy.Field. The modelerpy.Field constructor accepts field name, field storage, and field measurement as its input parameters (field storage and field measurement are required; field measurement is optional).

dataModel = modelerpy.DataModel([
#                           %FieldName%, %StorageType%, %MeasurementType%
            modelerpy.Field(‘StringField’, ‘string’, ‘nominal’),
            modelerpy.Field(‘FloatField’, ‘real’, ‘continuous’),
            modelerpy.Field(‘IntegerField’, ‘integer’, ‘ordinal’),
            modelerpy.Field(‘BooleanField’, ‘integer’, ‘flag’),
            modelerpy.Field(‘DatetimeField’, ‘timestamp’, ‘continuous’),
            modelerpy.Field(‘TimeField’, ‘time’, ‘continuous’),
            modelerpy.Field(‘DateField’, ‘date’, ‘continuous’),
        ])
# StorageType could be: integer, real, string, date, time, timestamp
# MeasurementType could be: discrete, flag, nominal, ordinal, continuous


outputDataModel = modelerDataModel
outputDataModel.addField(modelerpy.Field(field_outlier, "real", measure="flag"))
outputDataModel.addField(modelerpy.Field(field_dist_hp, "real", measure="continuous"))