Creating a batch deployment

A batch deployment processes input data from a file, data connection, or connected data in a storage bucket, and writes the output to a file.

Before you begin

  1. Save a model to a deployment space.
  2. Promote or add the input file for the batch deployment to the space. For details on promoting an asset to a space, see Deployment spaces.

Structuring the input data

How you structure the input data, also known as the payload, for the batch job depends on the framework for the asset you are deploying. For supported input type by framework, see Batch deployment details.

A .csv input file or other structured data formats should be formatted to match the schema of the asset. List the column names (fields) in the first row and values to be scored in subsequent rows. For example:

PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
1,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S
4,1,"Winslet, Mr. Leo Brown",1,65,1,0,B/5 200763,7.50,,S

A JSON input file should provide the same information on fields and values, using this format:

{"input_data":[{
        "fields": [<field1>, <field2>, ...],
        "values": [[<value1>, <value2>, ...]]
}]}

For example:

{"input_data":[{
        "fields": ["PassengerId","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],
        "values": [[1,3,"Braund, Mr. Owen Harris",0,22,1,0,"A/5 21171",7.25,null,"S"],
                  [4,1,"Winselt, Mr. Leo Brown",1,65,1,0,"B/5 200763",7.50,null,"S"]]
}]}

Creating a batch deployment job

  1. From the deployment space, click the name of the saved model you want to deploy. The model detail page opens.
  2. Click Create deployment.
  3. Choose Batch as the deployment type and enter a name for your deployment.
  4. Choose a hardware definition based on the CPU and RAM that should be allocated for this deployment.

  5. Click Create to create the deployment.
  6. When the status changes to Deployed, click the deployment name, then click Create job to configure how to run the deployment.
  7. Define the details for the job, such as name and an optional description for the job.
  8. (Optional) Schedule when the batch job should run. Scheduled jobs display on the Jobs tab of the deployment space. You can edit the schedule and other options from the Jobs tab.
  9. Specify the input data source or sources. Input data depends on what you are deploying:
    • Choose Inline data to enter the payload in JSON format.
    • Choose Data asset to specify an input data source. The source can be a data source file you promoted to the space, a connection to a data source, or connected data in a storage bucket.
    • Choose multiple input data sources to match a model, such as an SPSS modeler flow or an AutoAI data join experiment, which has multiple inputs.
  10. If you specify a data asset, provide a name and optional description for the output file that will contain the results or choose a connected data asset where you want to write the results.
  11. (Optional) If you are deploying a Python script, you can enter environment variables to pass parameters to the job.
  12. Click Create to create the job, or Create and run to create the job and run it immediately. Results of the run are written to the specified output file and saved as a space asset.

Note: If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason is due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs. This issue only exists if you exclude days of a week when you schedule to run a job.

Batch scoring with connected data

When you create a batch deployment job programmatically, you can specify a direct connection to a data input and output source such as a DB2 database. When you create a batch deployment job from a space, however, you cannot access direct connections but you can connect to data in a storage repository such as a Cloud Object Storage bucket.

Using connected data for an SPSS modeler flow job

An SPSS modeler flow can have a number of input and output data nodes. When connecting to DB2 or DashDB as an input and output data source, note that the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.

To perform batch deployment of SPSS model using a DB2 or DB2 Warehouse connection, the modeler stream Input and Output nodes should both be Data Asset nodes. In SPSS Modeler, configure the Data Asset nodes must with the table names that will be used later for job predictions. This must be done before saving the model to Watson Machine Learning.

When creating the deployment job for the SPSS model, make sure the type of data sources are the same for input and output. The configured table names from the model stream will be passed to the batch deployment and the input/output table names provided in the connected data will be ignored.

To perform batch deployment of SPSS model using a Cloud Object Storage (COS) connection, make sure the SPSS model stream has single input and output data asset nodes.