Creating a batch deployment

A batch deployment processes input data from a file, data connection, or connected data in a storage bucket, and writes the output to a file.

Before you begin

  1. Save a model to a deployment space.
  2. Promote or add the input file for the batch deployment to the space. For details on promoting an asset to a space, see Deployment spaces.

Structuring the input data

How you structure the input data, also known as the payload, for the batch job depends on the framework for the asset you are deploying. For supported input type by framework, see Batch deployment details.

A .csv input file or other structured data formats should be formatted to match the schema of the asset. List the column names (fields) in the first row and values to be scored in subsequent rows. For example:

PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
1,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S
4,1,"Winslet, Mr. Leo Brown",1,65,1,0,B/5 200763,7.50,,S

A JSON input file should provide the same information on fields and values, using this format:

{"input_data":[{
        "fields": [<field1>, <field2>, ...],
        "values": [[<value1>, <value2>, ...]]
}]}

For example:

{"input_data":[{
        "fields": ["PassengerId","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],
        "values": [[1,3,"Braund, Mr. Owen Harris",0,22,1,0,"A/5 21171",7.25,null,"S"],
                  [4,1,"Winselt, Mr. Leo Brown",1,65,1,0,"B/5 200763",7.50,null,"S"]]
}]}

Creating a batch deployment

  1. From the deployment space, click the name of the saved model you want to deploy. The model detail page opens.
  2. Click Create deployment.
  3. Choose Batch as the deployment type and enter a name for your deployment.
  4. Choose a hardware definition based on the CPU and RAM that should be allocated for this deployment.

  5. Click Create to create the deployment.
  6. When the status changes to Deployed, the deployment creation is complete.

Viewing deployment details

Click the name of a deployment to view the details.

View deployment details

You can view the configuration details such as hardware and software specifications. You can also get the deployment ID, which you can use in API calls from an endpoint. For details, see Looking up a deployment endpoint.

Create a batch job from a deployment

  1. Click the name of a deployment, then click Create job to configure how to run the deployment.
  2. Define the details for the job, such as name and an optional description for the job.
  3. (Optional) Schedule when the batch job should run. Scheduled jobs display on the Jobs tab of the deployment space. You can edit the schedule and other options from the Jobs tab.
  4. Specify the input data source or sources. Input data depends on what you are deploying:
    • Choose Inline data to enter the payload in JSON format.
    • Choose Data asset to specify an input data source. The source can be a data source file you promoted to the space, a connection to a data source, or connected data in a storage bucket.
    • Choose multiple input data sources to match a model, such as an SPSS modeler flow or an AutoAI data join experiment, which has multiple inputs.
  5. If you specify a data asset, provide a name and optional description for the output file that will contain the results or choose a connected data asset where you want to write the results.
  6. (Optional) If you are deploying a Python script, you can enter environment variables to pass parameters to the job.
  7. Click Create to create the job, or Create and run to create the job and run it immediately. Results of the run are written to the specified output file and saved as a space asset.

Batch scoring with connected data

When you create a batch deployment job programmatically, you can specify a direct connection to a data input and output source such as a DB2 database. When you create a batch deployment job from a space, however, you cannot access direct connections but you can connect to data in a storage repository such as a Cloud Object Storage bucket or a Storage volume.

Using connected data for an SPSS modeler flow job

An SPSS modeler flow can have a number of input and output data nodes. When connecting to a supported database as an input and output data source, note that the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.

To perform batch deployment of an SPSS model using a database connection, make sure the modeler stream Input and Output nodes are Data Asset nodes. In SPSS Modeler, the Data Asset nodes must be configured with the table names that will be used later for job predictions. Set the nodes and table names before you save the model to Watson Machine Learning. While configuring the Data Asset nodes, choose the table name from the Connections; choosing a Data Asset that is created in your project is currently not supported.

When creating the deployment job for the SPSS model, make sure the type of data sources are the same for input and output. The configured table names from the model stream will be passed to the batch deployment and the input/output table names provided in the connected data will be ignored.

To perform batch deployment of SPSS model using a Cloud Object Storage (COS) connection, make sure the SPSS model stream has single input and output data asset nodes.

Creating batch deployments programmatically

See Machine Learning samples and examples for links to sample notebooks that demonstrate creating batch deployments using the Watson Machine Learning REST API and Watson Machine Learning Python client library.