0 / 0
Data sources for scoring batch deployments
Data sources for scoring batch deployments

Data sources for scoring batch deployments

You can supply input data for a batch deployment job in several ways, including directly uploading a file or providing a link to database tables. The types of allowable input data vary according to the type of deployment job you are creating.

For supported input types by framework, refer to Batch deployment input details by framework.

Input data can be supplied to a batch job as inline data or data reference.

Inline data description

Inline type input data for batch processing is specified in the batch deployment job's payload. For example, you can pass a CSV file as the deployment input in the UI or as a value for the scoring.input_data parameter in a notebook. When the batch deployment job is completed, the output is written to the corresponding job's scoring.predictions metadata parameter.

Data reference description

Data reference type input and output data for batch processing can be stored in a remote data source like a Cloud Object Storage bucket, an SQL/no-SQL database, or as a local or managed data asset in a deployment space.

Details for data references include:

  • Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.

  • For data_asset type, the references to input data must be specified as a /v2/assets href in the input_data_references.location.href parameter in the deployment job's payload. The data asset specified here can be a reference to a local or a connected data asset. Also, if the batch deployment job's output data has to be persisted in a remote data source, the references to output data must be specified as a /v2/assets href in output_data_reference.location.href parameter in the deployment job's payload.

  • Any input and output data_asset references must be in the same space id as the batch deployment.

  • If the batch deployment job's output data has to be persisted in a deployment space as a local asset, output_data_reference.location.name must be specified. When the batch deployment job is completed successfully, the asset with the specified name will be created in the space.

  • If the output data references where the data asset is in a remote database, you can specify if the batch output should be appended to the table or if the table is to be truncated and output data updated. Use the output_data_references.location.write_mode parameter to specify the values truncate or append. Note the following:

    • Specifying truncate as value truncates the table and inserts the batch output data.
    • Specifying append as value appends the batch output data to the remote database table.
    • write_mode is applicable only for the output_data_references parameter.
    • write_mode is applicable only for remote database related data assets. This parameter will not be applicable for a local data asset or a COS-based data asset.

Structuring the input data

How you structure the input data, also known as the payload, for the batch job depends on the framework for the asset you are deploying.

A .csv input file or other structured data formats should be formatted to match the schema of the asset. List the column names (fields) in the first row and values to be scored in subsequent rows. For example:

PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
1,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S
4,1,"Winslet, Mr. Leo Brown",1,65,1,0,B/5 200763,7.50,,S

A JSON input file should provide the same information on fields and values, using this format:

{"input_data":[{
        "fields": [<field1>, <field2>, ...],
        "values": [[<value1>, <value2>, ...]]
}]}

For example:

{"input_data":[{
        "fields": ["PassengerId","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],
        "values": [[1,3,"Braund, Mr. Owen Harris",0,22,1,0,"A/5 21171",7.25,null,"S"],
                  [4,1,"Winselt, Mr. Leo Brown",1,65,1,0,"B/5 200763",7.50,null,"S"]]
}]}

Example data_asset payload

"input_data_references": [{
    "type": "data_asset",
    "connection": {
    },
    "location": {
        "href": "/v2/assets/<asset_id>?space_id=<space_id>"
    }
}]

Example connection_asset payload

"input_data_references": [{
    "type": "connection_asset",
    "connection": {
        "id": "<connection_guid>"
    },
    "location": {
        "bucket": "<bucket name>",
        "file_name": "<directory_name>/<file name>"
    }
    <other wdp-properties supported by runtimes>
}]

Parent topic: Creating a batch deployment