0 / 0
Assets in Cloud Pak for Data
Assets in Cloud Pak for Data

Assets in Cloud Pak for Data

Cloud Pak for Data and its services provide a platform with collaborative workspaces and tools. You provide the content to the platform, in the form of assets. An asset is an item that contains information about data, other valuable information, or code that works with data.

You add assets by importing them or creating them with tools. You work with assets in collaborative workspaces. The workspace you use depends on your tasks.

Projects Where you collaborate with others to work with data and create assets. Most tools are in projects and you run assets that contain code in projects. For example, you can import data, prepare data, analyze data, or create models in projects. See Projects.

Catalogs Where you store assets to share with your organization or go to find assets that you need to work with. You can copy assets from catalogs into projects, or publish assets from projects into the catalog. You can edit asset properties and metadata in a catalog, but you can't run assets. See Catalogs.

Deployment spaces Where you deploy and run assets that are ready for testing or production. You move assets from projects into deployment spaces and then create deployments from those assets. You monitor and update deployments as necessary. See Deployment spaces.

The following graphic illustrates how you can move assets around the platform.

Assets move between projects and catalogs and from projects to deployment spaces.

You can find any asset in any of the workspaces for which you are a collaborator by searching for it from the global search bar. See Searching for assets across the platform.

You can create many different types of assets, but all assets have some common properties:

Asset types

To create most types of assets, you must use a specific tool. Most tools are provided by one or more services. The tools to create data assets and connection assets are provided by the platform and do not require any specific services.

To see which services you need for which tools, open the tools and services map.

The following table lists the types of assets that you can create, the tools you need to create them, and the workspaces where you can add them.

Asset type Description Tools to create it Workspaces
AutoAI experiment Automatically generates candidate predictive model pipelines. AutoAI Projects
COBOL copybook Display the map metadata for connected data assets from z/OS mainframe computers. Metadata import tool Catalogs
Connected data asset Represents data that is accessed through a connection to a remote data source. Connected data tool, Metadata import tool Projects, Catalogs, Spaces
Connection asset Contains the information to connect to a data source. Connection tool Projects, Catalogs, Spaces
Dashboard Visualizes data in interactive graphs without code. Dashboard editor Projects, Catalogs
Data asset from a file Represents a file that you uploaded from your local system. Upload pane Projects, Catalogs, Spaces
Data definition Defines a reusable column metadata component for DataStage flow jobs. DataStage component editor Projects
Data Refinery flow Prepares data. Data Refinery Projects, Spaces
Data replication flow Replicates data. Data Replication Projects
Data quality definition Defines a reusable rule logic component for data quality rules. Data quality definition editor Projects, Catalogs
Data quality rule Evaluates data quality for specific conditions. Data quality rule editor Projects
DataStage flow Transforms and integrates data. DataStage flow editor Projects
Decision Optimization experiment Solves optimization problems. Decision Optimization model builder Projects
Federated learning experiment Trains a common model on a set of remote data sources. Federated learning Projects
Folder asset Represents a folder in IBM Cloud Object Storage. Connected data tool Projects, Catalogs, Spaces
Jupyter notebook Runs Python or R code to analyze data or build models. Jupyter notebook editor, RStudio Projects, Catalogs
Masking flow Creates masked copies of data assets. Masking flow Projects
Master data configuration Configures Match 360. Match 360 Projects
Metadata enrichment Enriches imported asset metadata. Metadata enrichment tool Projects
Metadata import Imports asset metadata from a connection. Metadata import tool Projects
Model Contains information about a saved or imported model. Various tools that run experiments or train models Projects, Catalogs, Spaces
Model use case Tracks the lifecyle of a model from request to production. AI Factsheets Catalogs
Parameter set Collects a reusable set of job parameters for DataStage jobs. Parameter set editor Projects
Pipeline Automates the model lifecycle. Watson Pipelines Projects
Python function Contains Python code to support a model in production. Jupyter notebook editor Projects, Spaces
Schema library Imports a reusable set of resources for DataStage flows. DataStage component editor Projects
Script Contains a Python or R scripts to support a model in production. Jupyter notebook editor, RStudio Projects, Spaces
SPSS Modeler flows Runs a flow to prepare data and build a model. SPSS Modeler Projects
Standardization rule Defines a reusable rule component to format data in DataStage flows. DataStage component editor Projects
Subflow Defines a reusable set of stages and connectors for DataStage flows. DataStage component editor Projects
Visualization Shows visualizations from a data asset. Visualization page in data assets Projects


Common properties for assets

Assets accumulate information in properties when you create them, use them, or when they are updated by automated processes. Some properties are provided by users and can be edited by users. Other properties are automatically provided by the system. Most system-provided properties can't be edited by users.

Common properties for assets everywhere

Most types of assets have the properties listed in the following table in all the workspaces where those asset types exist.

Property Description Editable?
Name The asset name. Can contain up to 100 characters. Supports multibyte characters. Cannot be empty, contain Unicode control characters, or contain only blank spaces. Asset names do not need to be unique within a project or deployment space. Whether asset names must be unique in a catalog depends on the duplicate handling method set for the catalog. Yes
Description Optional. Can contain up to 245 characters, not including blank spaces. Supports multibyte characters and hyperlinks. Yes
Creation date The timestamp of when the asset was created or imported. No
Creator or Owner The user name or email address of the person who created or imported the asset. No
Last modified date The timestamp of when the asset was last modified. No
Last editor The user name or email address of the person who last modified the asset. No


Common properties for assets in catalogs

In addition to the common properties that all assets have, assets in catalogs have the properties and pages that are listed in the following table.

Property or page Description Editable?
Asset page A view of the contents of the asset. No
Privacy Set to public by default. This setting can restrict access to an asset in a catalog when it's set to private. Only the owner and members of the asset can view and use private assets. Yes
Access page The owner and members of the asset. By default, the asset owner is the user who added the asset to the catalog. The asset members can view and use the asset when it is marked private. Yes
Ratings page Optional. Catalog collaborators can rate and review assets. Yes
Tags Optional. Text labels that catalog collaborators create to simplify searching. A tag consists of one string of up to 255 characters. It can contain spaces, letters, numbers, underscores, dashes, and the symbols # and @. Yes
Relationships Optional. Can be between assets in the same workspace or different workspaces. For example, you can add a relationship between an asset in a catalog and an asset in a project. Administrators can create custom relationships for assets. See Adding asset relationships. Yes
Governance artifacts Optional. The business terms and classification that users assigned to the asset. Yes



You can create custom properties for assets. Custom properties are shown in the Details section on the asset's Overview tab in the catalog. See Custom properties and relationships.

To edit asset properties, you must have the required permissions. See Editing assets in a catalog.

Common properties for assets that run in tools

Some assets are associated with running a tool. For example, an AutoAI experiment asset runs in the AutoAI tool. Assets that run in tools are also known as operational assets. Every time you run assets in tools, you start a job. You can monitor and schedule jobs. Jobs use compute resources. Compute resources are measured in capacity unit hours (CUH) and are tracked. Depending on the plans for your services, you can have a limited amount of CUH per month, or pay for the CUH that you use every month.

For many assets that run in tools, you have a choice of the compute environment configuration to use. Typically, larger and faster environment configurations consume compute resources faster.

In addition to basic properties, most assets that run in tools contain the following types of information in projects:

Properties Description Editable? Workspaces
Environment definition The environment template, hardware specification, and software specification for running the asset. See Environments. Yes Projects, Spaces
Settings Information that defines how the asset is run. Specific to each type of asset. Yes Projects
Associated data assets The data that the asset is working on. Yes Projects
Jobs Information about how to run the asset, including the environment definition, schedule, and notification options. See Jobs. Yes Projects, Spaces


Data asset types and their properties

Data asset types contain metadata and other information about data, including how to access the data.

How you create a data asset depends on where your data is:

  • If your data is in a file, you upload the file from your local system to a project, catalog, or deployment space.
  • If your data is in a remote data source, you first create a connection asset that defines the connection to that data source. Then you create a data asset by selecting the connection, the path or other structure, and the table or file that contains the data. This type of data asset is called a connected data asset.

The following graphic illustrates how data assets from files point to uploaded files in IBM Cloud and connected data assets require a connection asset and point to data in a remote data source.

This graphic shows that data assets from files point to uploaded files and connected data assets require a connection asset and point to data in a remote data source.

You can create the following types of data assets:

Data asset from a file Represents a file that you uploaded from your local system. The file is stored in the object storage container on the IBM Cloud Object Storage instance that is associated with the project, catalog, or deployment space. The contents of the file can include structured data, unstructured textual data, images, and other types of data. You can create a data asset with a file of any format. However, you can do more actions on CSV files than other file type. See Properties of data assets. You can create a data assets from a file by uploading a file in a project, catalog, or deployment space. You can also create data files with tools and convert them to assets. For example, you can create data assets from files with the Data Refinery, Jupyter notebook, and RStudio tools.

Connected data asset Represents a table, file, or folder that is accessed through a connection to a remote data source. The connection is defined in the connection asset that is associated with the connected data asset. You can create a connected data asset for every supported connection. When you access a connected data asset, the data is dynamically retrieved from the data source. See Properties of data assets. You can import connected data assets from a data source with the connected data tool in a project, catalog, or deployment space. If you want to rerun the import on a schedule, use the metadata import tool in projects. You can create virtual tables that compile data from multiple data source with Watson Query in the Data virtualization workspace.

Folder asset Represents a folder in IBM Cloud Object Storage. A folder data asset is special case of a connected data asset. You create a folder data asset by specifying the path to the folder and the IBM Cloud Object Storage connection asset. You can view the files and subfolders that share the path with the folder data asset. The files that you can view within the folder data asset are not themselves data assets. For example, you can create a folder data asset for a path that contains news feeds that are continuously updated. See Properties of data assets. You can import folder assets from IBM Cloud Object Storage with the connected data tool in a project, catalog, or deployment space.

Connection asset Contains the information necessary to create a connection to a data source. See Properties of connection assets. You can create connections with the connection tool in a project, catalog, or deployment space.

Learn more about creating and importing data assets:

Properties of data assets from files and connected data assets

In addition to basic properties and common catalog properties, data assets from files and connected data assets have the properties or pages that are listed in the following table.

Property or page Description Editable? Workspaces
Preview asset or Asset page A preview of the data that includes a limited set of columns and rows from the original data source. See Previews. No Projects, Catalogs, Spaces
Profile page Metadata and statistics about the content of the data. See profile. Yes Projects, Catalogs
Activities pane The history of actions performed on the asset in all workspaces. See Activities. No Catalogs
Visualizations page Charts and graphs that users create to understand the data. See Visualizations. Yes Projects
Tags Optional. Text labels that users create to simplify searching. A tag consists of one string of up to 255 characters. It can contain spaces, letters, numbers, underscores, dashes, and the symbols # and @. Yes Projects, Catalogs
Format The MIME type of a file. Automatically detected. Yes Projects, Catalogs, Spaces
Source Information about the data file in storage or the data source and connection. No Projects, Catalogs, Spaces
Asset details Information about the size of the data, the number of columns and rows, and the asset version. No Projects, Catalogs, Spaces
Feature group Information about which columns in the data asset are used as features in models. See . Yes Projects, Catalogs, Spaces
Columns A summary of the properties of the columns in the data asset. No Catalogs


Properties of connection assets

The properties of connection assets depend on the data source that you select when you create a connection. See Connection types. Connection assets for most data sources have the properties that are listed in the following table.

Properties Description Editable? Workspaces
Connection details The information that identifies the data source. For example, the database name, hostname, IP address, port, instance ID, bucket, endpoint URL, and so on. Yes Projects, Catalogs, Spaces
Credential setting Whether the credentials are shared across the platform (default) or each user must enter their personal credentials. Not all data sources support personal credentials. Yes Projects, Catalogs, Spaces
Authentication method The format of the credentials information. For example, an API key or a username and password. Yes Projects, Catalogs, Spaces
Credentials The username and password, API key, or other credentials, as required by the data source and the specified authentication method. Yes Projects, Catalogs, Spaces
Certificates Whether the data source port is configured to accept SSL connections and other information about the SSL certificate. Yes Projects, Catalogs, Spaces
Private connectivity The method to connect to a database that is not externalized to the internet. See Securing connections Yes Projects, Catalogs, Spaces
Location and sovereignty The physical location of the data center where the data is stored and the sovereign entity that has jurisdiction over the data. Yes Projects, Catalogs, Spaces


Learn more

Parent topic: Overview of IBM Cloud Pak for Data as a Service

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more