Adding data assets to a deployment space

Last updated: Nov 21, 2024

Learn about various ways of adding and promoting data assets to a space and data types that are used in deployments.

Data can be:

A data file such as a .csv file
A connection to data that is located in a repository such as a database.
Connected data that is located in a storage bucket. For more information, see Using data from the Cloud Object Storage service.

Notes:

For definitions of data-related terms, refer to Asset types and properties.
You can use catalogs in IBM Knowledge Catalog as a feature store to access data assets that can be shared across an organization. Data assets include metadata about where they are used in models. Catalogs control access at the catalog and the data asset level.

You can add data to a space in one of these ways:

Data added to a space is managed in a similar way to data added to a project. For example:

Adding data to a space creates a new copy of the asset and its attachments within the space, maintaining a reference back to the project asset. If an asset such as a data connection requires access credentials, they persist and are the same whether you are accessing the data from a project or from a space.
Just like with data connection in a project, you can edit data connection details from the space.
Data assets are stored in a space in the same way that they are stored in a project. They use the same file structure for the space as the structure used for the project.

Adding data and connections to space by using UI

To add data or connections to space by using UI:

From the Assets tab of your deployment space, click Import assets.
Choose between adding connected data asset, catalog asset, or project file:
- If you want to add a connected data asset, select Connected data and choose a connection.
- If you want to add a catalog asset, select Catalog asset and choose a catalog.
- If you want to add a project file, select Project files and choose your project file.
Click Import.

The data asset displays in the space and is available for use as an input data source in a deployment job.

Note: Some types of connections allow for using your personal platform credentials. If you add a connection or connected data that uses your personal platform credentials, tick the Use my platform login credentials checkbox.

Adding data to space programmatically

If you are using APIs to create, update, or delete watsonx.ai Runtime assets, make sure that you use Data and AI Common Core API.

For an example of how to add assets programmatically, refer to this sample notebook: Use SPSS and batch deployment with Db2 to predict customer churn

Data source reference types in watsonx.ai Runtime

Data source reference types are referenced in watsonx.ai Runtime requests to represent input data and results locations. Use data_asset and connection_asset for these types of data sources:

Cloud Object Storage
Db2
Database data

Notes:

For Decision Optimization, the reference type is url.

Example data_asset payload

{"input_data_references": [{
    "type": "data_asset",
    "connection": {
    },
    "location": {
        "href": "/v2/assets/<asset_id>?space_id=<space_id>"
    }
}]

Example connection_asset payload

"input_data_references": [{
    "type": "connection_asset",
    "connection": {
        "id": "<connection_guid>"
    },
    "location": {
        "bucket": "<bucket_name>",
        "file_name": "<directory_name>/<file_name>"
    }
    <other wdp-properties supported by runtimes>
}]

For more information, see:

watsonx.ai Runtime REST API

Using data from the Cloud Object Storage service

Cloud Object Storage service can be used with deployment jobs through a connected data asset or a connection asset. To use data from the Cloud Object Storage service:

Create a connection to IBM Cloud Object Storage by adding a Connection to your project or space and selecting Cloud Object Storage (infrastructure) or Cloud Object Storage as the connector. Provide the secret key, access key, and login URL.

Note:
When you are creating a connection to Cloud Object Storage or Cloud Object Storage (Infrastructure), you must specify both access_key and secret_key. If access_key and secret_key are not specified, downloading the data from that connection doesn't work in a batch deployment job. For reference, see IBM Cloud Object Storage connection and IBM Cloud Object Storage (infrastructure) connection.
Add input and output files to the deployment space as connected data by using the Cloud Object Storage connection that you created.

Parent topic: Assets in deployment spaces