Batch deployment input details for SPSS models

Follow these rules when you are specifying input details for batch deployments of SPSS models.

Data type summary table:

Data	Description
Type	data references, inline
File formats	CSV

Data sources

Input or output data references:

Local or managed assets from the space
Connected (remote) assets from these sources:

Notes:

For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
For SPSS deployments, these data sources are not compliant with Federal Information Processing Standard (FIPS):
- Cloud Object Storage
- Cloud Object Storage (infrastructure)
- Storage volumes
Table names that are provided in input and output data references are ignored. Table names that are referred to in the SPSS model are used during the batch deployment.
Use SQL PushBack to generate SQL statements for IBM SPSS Modeler operations that can be “pushed back” to or run in the database to improve performance. SQL Pushback is only supported by:
- Db2
- SQL Server
- Netezza Performance Server

Using connected data for a batch deployment

An SPSS Modeler flow can have a number of import and export nodes for data. If the nodes use database connections, they must be configured with the table names in the data sources and targets. These table names are used later for batch jobs. Use Data Asset nodes for importing data and Data Asset Export nodes for exporting data. When you are configuring the nodes, choose the table name from Connections; don't choose a data asset in your project. Set the nodes and table names before you save and deploy the model to Watson Machine Learning.

When you deploy the model to a deployment space, check the nodes connect to a supported database in the deployment space. In a batch deployment of the model, the connection details are selected from the input and output data references, but the input and output table names are selected from the SPSS Modeler model. The input and output table names that are provided in the connected data references are ignored.

For batch deployment of an SPSS model that uses a Cloud Object Storage connection, make sure that the SPSS model has a single input and output data asset node.

Supported combinations of input and output sources

You must specify compatible data sources and targets for the batch job input and the output. If you specify incompatible data sources and targets, you get an error when you try to run the batch job.

These combinations are supported for batch jobs:

SPSS model input/output	Batch deployment job input	Batch deployment job output
File	Local, managed, or referenced data asset or connection asset (file)	Remote data asset or connection asset (file) or name
Database	Remote data asset or connection asset (database)	Remote data asset or connection asset (database)

Specifying multiple inputs

If you are specifying multiple inputs for an SPSS model deployment with no schema, specify an ID for each element in input_data_references.

For more information, see Using multiple data sources for an SPSS job.

In this example, when you create the job, provide three input entries with IDs: sample_db2_conn, sample_teradata_conn, and sample_googlequery_conn and select the required connected data for each input.

{
"deployment": {
    "href": "/v4/deployments/<deploymentID>"
  },
  "scoring": {
  	  "input_data_references": [{
               "id": "sample_db2_conn",
               "name": "DB2 connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_teradata_conn",
               "name": "Teradata connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_googlequery_conn",
               "name": "Google bigquery connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           }],
  	  "output_data_references": {
  	  	        "id": "sample_db2_conn",
                "type": "data_asset",
                "connection": {},
                "location": {
                    "href": "/v2/assets/<asset_id>?space_id=<space_id>"
                },
          }
}

Note: The environment variables parameter of deployment jobs is not applicable.

Specifying data references programmatically

If you are specifying input and output data references programmatically:

Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
SPSS jobs support multiple data source inputs and a single output. If the schema is not in the metadata for the model when you saved it, you must enter id manually and select a data asset for each connection. If the schema is provided in the metadata for the model, id names are populated automatically by using metadata. You select the data asset for the corresponding ids in Watson Studio. For more information, see Using multiple data sources for an SPSS job.
To create a local or managed asset as an output data reference, the name field must be specified for output_data_reference so that a data asset is created with the specified name. You cannot specify an href that refers to an existing local data asset.

Note:

Connected data assets that refer to supported databases can be created in the output_data_references only when the input_data_references also refers to one of these sources.

If you are creating a job by using the Python client, you must provide the connection name that is referred in the data nodes of the SPSS model model in the id field, and the data asset href in location.href for input/output data references of the deployment jobs payload. For example, you can construct the job payload like this:

job_payload_ref = {
    client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{
        "id": "DB2Connection",
        "name": "drug_ref_input1",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href1>
        }
    },{
        "id": "Db2 WarehouseConn",
        "name": "drug_ref_input2",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href2>
        }
    }],
    client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
            "type": "data_asset",
            "connection": {},
            "location": {
                "href": <output_asset_href>
            }
        }
    }

Parent topic: Batch deployment input details by framework