Batch deployment input details for SPSS models

Last updated: Oct 09, 2024

Follow these rules when you are specifying input details for batch deployments of SPSS models.

Data type summary table:

Data	Description
Type	inline, data references
File formats	CSV

Data Sources

Input/output data references:

Local/managed assets from the space
Connected (remote) assets from these sources:

Notes:

For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
For SPSS deployments, these data sources are not compliant with Federal Information Processing Standard (FIPS):
- Cloud Object Storage
- Cloud Object Storage (infrastructure)
- Storage volumes

If you are specifying input/output data references programmatically:

Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
SPSS jobs support multiple data source inputs and a single output. If the schema is not provided in the model metadata at the time of saving the model, you must enter id manually and select a data asset for each connection. If the schema is provided in model metadata, id names are auto populated by using metadata. You just select the data asset for the corresponding ids in Watson Studio. For details, refer to Using multiple data sources for an SPSS job.
To create a local or managed asset as an output data reference, the name field must be specified for output_data_reference so that a data asset will be created with the specified name. Specifying an href that refers to an existing local data asset is not supported. Note that connected data assets that refer to supported databases can be created in the output_data_references only when the input_data_references also refers to one of these sources.
Table names that are provided in input and output data references are ignored. Table names that are referred in SPSS model stream will be used during the batch deployment.
SQL PushBack allows you to generate SQL statements for native IBM SPSS Modeler operations that can be “pushed back” to (that is, executed in) the database to improve performance. SQL Pushback is only supported with:
- Db2
- SQL Server
- Netezza Performance Server

If you are creating a job by using the Python client, you must provide the connection name that is referred in data nodes of SPSS model stream in the id field, and the data asset href in location.href for input/output data references of the deployment jobs payload. For example, you can construct the jobs payload like this:

job_payload_ref = {
    client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{
        "id": "DB2Connection",
        "name": "drug_ref_input1",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href1>
        }
    },{
        "id": "Db2 WarehouseConn",
        "name": "drug_ref_input2",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href2>
        }
    }],
    client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
            "type": "data_asset",
            "connection": {},
            "location": {
                "href": <output_asset_href>
            }
        }
    }

Using connected data for an SPSS Modeler flow job

An SPSS Modeler flow can have a number of input and output data nodes. When you are connecting to a supported database as an input and output data source, note that the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.

To perform batch deployment of an SPSS model that uses a database connection, make sure that the modeler stream Input and Output nodes are Data Asset nodes. In SPSS Modeler, the Data Asset nodes must be configured with the table names that will be used later for job predictions. Set the nodes and table names before you save the model to Watson Machine Learning. When you are configuring the Data Asset nodes, choose the table name from the Connections; choosing a Data Asset that is created in your project is currently not supported.

When you are creating the deployment job for an SPSS model, make sure that the type of data sources are the same for input and output. The configured table names from the model stream will be passed to the batch deployment and the input/output table names that are provided in the connected data will be ignored.

To perform batch deployment of an SPSS model that uses a Cloud Object Storage (COS) connection, make sure that the SPSS model stream has single input and output data asset nodes.

Supported combinations of input and output sources

You must specify compatible sources for the SPSS Modeler flow input, the batch job input, and the output. If you specify an incompatible combination of types of data sources, you will get an error when you try to execute the batch job.

These combinations are supported for batch jobs:

SPSS model stream input/output	Batch deployment job input	Batch deployment job output
File	Local/managed or referenced data asset or connection asset (file)	Remote data asset or connection asset (file) or name
Database	Remote data asset or connection asset (database)	Remote data asset or connection asset (database)

Specifying multiple inputs

If you are specifying multiple inputs for an SPSS model stream deployment with no schema, specify an ID for each element in input_data_references.

For details, see Using multiple data sources for an SPSS job.

In this example, when you create the job, provide three input entries with IDs: sample_db2_conn, sample_teradata_conn, and sample_googlequery_conn and select the required connected data for each input.

{
"deployment": {
    "href": "/v4/deployments/<deploymentID>"
  },
  "scoring": {
  	  "input_data_references": [{
               "id": "sample_db2_conn",
               "name": "DB2 connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_teradata_conn",
               "name": "Teradata connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_googlequery_conn",
               "name": "Google bigquery connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           }],
  	  "output_data_references": {
  	  	        "id": "sample_db2_conn",
                "type": "data_asset",
                "connection": {},
                "location": {
                    "href": "/v2/assets/<asset_id>?space_id=<space_id>"
                },
          }
}

Note: The environment variables parameter of deployment jobs is not applicable.

Parent topic: Batch deployment input details by framework

Was the topic helpful?

0/1000