0 / 0
Batch deployment input details for SPSS models

Batch deployment input details for SPSS models

Follow these rules when you are specifying input details for batch deployments of SPSS models.

Data type summary table:

Data Description
Type inline, data references
File formats CSV

Data sources

Input or output data references:

Notes:

If you are specifying input/output data references programmatically:

  • Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
  • SPSS jobs support multiple data source inputs and a single output. If the schema is not provided in the model metadata at the time of saving the model, you must enter id manually and select a data asset for each connection. If the schema is provided in model metadata, id names are populated automatically by using metadata. You select the data asset for the corresponding ids in Watson Studio. For more information, see Using multiple data sources for an SPSS job.
  • To create a local or managed asset as an output data reference, the name field must be specified for output_data_reference so that a data asset is created with the specified name. Specifying an href that refers to an existing local data asset is not supported.
Note:

Connected data assets that refer to supported databases can be created in the output_data_references only when the input_data_references also refers to one of these sources.

  • Table names that are provided in input and output data references are ignored. Table names that are referred in the SPSS model stream are used during the batch deployment.

  • Use SQL PushBack to generate SQL statements for IBM SPSS Modeler operations that can be “pushed back” to or run in the database to improve performance. SQL Pushback is only supported by:

    • Db2
    • SQL Server
    • Netezza Performance Server
  • If you are creating a job by using the Python client, you must provide the connection name that is referred in the data nodes of the SPSS model stream in the id field, and the data asset href in location.href for input/output data references of the deployment jobs payload. For example, you can construct the job payload like this:

    job_payload_ref = {
        client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{
            "id": "DB2Connection",
            "name": "drug_ref_input1",
            "type": "data_asset",
            "connection": {},
            "location": {
                "href": <input_asset_href1>
            }
        },{
            "id": "Db2 WarehouseConn",
            "name": "drug_ref_input2",
            "type": "data_asset",
            "connection": {},
            "location": {
                "href": <input_asset_href2>
            }
        }],
        client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
                "type": "data_asset",
                "connection": {},
                "location": {
                    "href": <output_asset_href>
                }
            }
        }
    

Using connected data for an SPSS Modeler flow job

An SPSS Modeler flow can have a number of input and output data nodes. When you connect to a supported database as an input and output data source, the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.

For batch deployment of an SPSS model that uses a database connection, make sure that the modeler stream Input and Output nodes are Data Asset nodes. In SPSS Modeler, the Data Asset nodes must be configured with the table names that are used later for job predictions. Set the nodes and table names before you save the model to Watson Machine Learning. When you are configuring the Data Asset nodes, choose the table name from the Connections; choosing a Data Asset that is created in your project is not supported.

When you are creating the deployment job for an SPSS model, make sure that the types of data sources are the same for input and output. The configured table names from the model stream are passed to the batch deployment and the input/output table names that are provided in the connected data are ignored.

For batch deployment of an SPSS model that uses a Cloud Object Storage connection, make sure that the SPSS model stream has single input and output data asset nodes.

Supported combinations of input and output sources

You must specify compatible sources for the SPSS Modeler flow input, the batch job input, and the output. If you specify an incompatible combination of types of data sources, you get an error when you try to run the batch job.

These combinations are supported for batch jobs:

SPSS model stream input/output Batch deployment job input Batch deployment job output
File Local, managed, or referenced data asset or connection asset (file) Remote data asset or connection asset (file) or name
Database Remote data asset or connection asset (database) Remote data asset or connection asset (database)

Specifying multiple inputs

If you are specifying multiple inputs for an SPSS model stream deployment with no schema, specify an ID for each element in input_data_references.

For more information, see Using multiple data sources for an SPSS job.

In this example, when you create the job, provide three input entries with IDs: sample_db2_conn, sample_teradata_conn, and sample_googlequery_conn and select the required connected data for each input.

{
"deployment": {
    "href": "/v4/deployments/<deploymentID>"
  },
  "scoring": {
  	  "input_data_references": [{
               "id": "sample_db2_conn",
               "name": "DB2 connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_teradata_conn",
               "name": "Teradata connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_googlequery_conn",
               "name": "Google bigquery connection",
               "type": "data_asset",
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           }],
  	  "output_data_references": {
  	  	        "id": "sample_db2_conn",
                "type": "data_asset",
                "connection": {},
                "location": {
                    "href": "/v2/assets/<asset_id>?space_id=<space_id>"
                },
          }
}
Note: The environment variables parameter of deployment jobs is not applicable.

Parent topic: Batch deployment input details by framework

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more