You can supply input data for a batch deployment job in several ways, including directly uploading a file or providing a link to database tables. The types of allowable input data vary according to the type of deployment job that you are creating.
Available input types for batch deployments by framework and asset type
Copy link to section
Available input types for batch deployments by framework and asset type
Framework
Batch deployment type
Decision Optimization
Inline and Reference
Python function
Inline
PyTorch
Inline and Reference
Tensorflow
Inline and Reference
Scikit-learn
Inline and Reference
Python scripts
Reference
Spark MLlib
Inline and Reference
SPSS
Inline and Reference
XGBoost
Inline and Reference
Inline data description
Copy link to section
Inline type input data for batch processing is specified in the batch deployment job's payload. For example, you can pass a CSV file as the deployment input in the UI or as a value for the scoring.input_data parameter in a notebook.
When the batch deployment job is completed, the output is written to the corresponding job's scoring.predictions metadata parameter.
Data reference description
Copy link to section
Input and output data of type data reference that is used for batch processing can be stored:
In a remote data source, like a Cloud Object Storage bucket or an SQL or no-SQL database.
As a local or managed data asset in a deployment space.
For data_asset type, the reference to input data must be specified as a /v2/assets href in the input_data_references.location.href parameter in the deployment job's payload. The data asset that is
specified is a reference to a local or a connected data asset. Also, if the batch deployment job's output data must be persisted in a remote data source, the references to output data must be specified as a /v2/assets href
in output_data_reference.location.href parameter in the deployment job's payload.
Any input and output data_asset references must be in the same space ID as the batch deployment.
If the batch deployment job's output data must be persisted in a deployment space as a local asset, output_data_reference.location.name must be specified. When the batch deployment job is completed successfully, the asset
with the specified name is created in the space.
Output data can contain information on where in a remote database the data asset is located. In this situation, you can specify whether to append the batch output to the table or truncate the table and update the output data. Use the output_data_references.location.write_mode parameter to specify the values truncate or append.
Specifying truncate as value truncates the table and inserts the batch output data.
Specifying append as value appends the batch output data to the remote database table.
write_mode is applicable only for the output_data_references parameter.
write_mode is applicable only for remote database-related data assets. This parameter is not applicable for a local data asset or a Cloud Object Storage based data asset.
How you structure the input data, also known as the payload, for the batch job depends on the framework for the asset you are deploying.
A .csv input file or other structured data formats must be formatted to match the schema of the asset. List the column names (fields) in the first row and values to be scored in subsequent rows. For example, see the following code snippet:
PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
1,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S
4,1,"Winslet, Mr. Leo Brown",1,65,1,0,B/5 200763,7.50,,S
Copy to clipboardCopied to clipboard
A JSON input file must provide the same information on fields and values, by using this format:
{"input_data":[{"fields":["PassengerId","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],"values":[[1,3,"Braund, Mr. Owen Harris",0,22,1,0,"A/5 21171",7.25,null,"S"],[4,1,"Winselt, Mr. Leo Brown",1,65,1,0,"B/5 200763",7.50,null,"S"]]}]}
Copy to clipboardCopied to clipboard
Preparing a payload that matches the schema of an existing model
Copy link to section
Refer to this sample code:
model_details = client.repository.get_details("<model_id>") # retrieves details and includes schema
columns_in_schema = []
for i inrange(0, len(model_details['entity']['schemas']['input'][0].get('fields'))):
columns_in_schema.append(model_details['entity']['schemas']['input'][0].get('fields')[i]['name'])
X = X[columns_in_schema] # where X is a pandas dataframe that contains values to be scored#(...)
scoring_values = X.values.tolist()
array_of_input_fields = X.columns.tolist()
payload_scoring = {"input_data": [{"fields": [array_of_input_fields],"values": scoring_values}]}
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.