Batch deployment input details for Python scripts
Last updated: Oct 09, 2024
Follow these rules when specifying input details for batch deployments of Python scripts.
Data type summary table:
Data | Description |
---|---|
Type | data references |
File formats | any |
Data Sources
Input/output data references:
- Local/managed assets from the space
- Connected (remote) assets: Cloud Object Storage
Notes:
- For connections of type Cloud Object Storage or Cloud Object Storage(infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
If you are specifying input/output data references programmatically:
- Data source reference
type
depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - You can specify environment variables required for executing the Python Script as
'key': 'value'
pairs inscoring.environment_variables
. Thekey
must be the name of an environment variable and thevalue
must be the corresponding value of the environment variable. - The deployment job's payload will be saved as a JSON file in the deployment container where the Python script will be executed. The Python script can access the full path filename of the JSON file using the
JOBS_PAYLOAD_FILE
environment variable. - If input data is referenced as a local or managed data asset, deployment service will download the input data and place it in the deployment container where the Python script will be executed. You can access the location (path) of the downloaded
input data through the
BATCH_INPUT_DIR
environment variable. - For input data references (data asset or connection asset), downloading of the data must be handled by the Python script. If a connected data asset or a connection asset is present in the deployment jobs payload, you can access it using the
JOBS_PAYLOAD_FILE
environment variable which contains the full path to deployment job's payload saved as a JSON file. - If output data must be persisted as a local or managed data asset in a space, you can specify the name of the asset to be created in
scoring.output_data_reference.location.name
. As part of Python script, output data can be placed in the path specified by theBATCH_OUTPUT_DIR
environment variable. Deployment service will compress the data to ZIP format and upload it in the location specified inBATCH_OUTPUT_DIR
. - These environment variables are set internally. If you try to set them manually, your values will be overridden:
BATCH_INPUT_DIR
BATCH_OUTPUT_DIR
JOBS_PAYLOAD_FILE
- If output data must be saved in a remote data store, you must specify the reference of the output data reference (for example, a data asset or a connected data asset) in
output_data_reference.location.href
. The Python script must take care of uploading the output data to the remote data source. If a connected data asset or a connection asset reference is present in the deployment jobs payload, you can access it using theJOBS_PAYLOAD_FILE
environment variable, which contains the full path to deployment job's payload saved as a JSON file. - If the Python script does not require any input or output data references to be specified in the deployment job payload, then do not provide the
scoring.input_data_references
andscoring.output_data_references
objects in the payload.
Learn more
- For general information on deploying scripts, refer to Deploying scripts in Watson Machine Learning.
Parent topic: Batch deployment input details by framework
Was the topic helpful?
0/1000