Asset types and properties
An asset is an item that contains metadata about data, other types of information, or operational code. You work with assets throughout the Cloud Pak for Data platform, including the main workspaces: projects, catalogs, and deployment spaces.
You can create these main types of assets:
- Data assets contain metadata about data, including how to access the data.
- Operational assets run code to work with data or other types of information.
- Configuration assets contain reusable templates.
To understand assets, you must know the different types of assets, their properties, and where you can find them:
- Workspaces for assets
- Data assets
- Operational assets
- Configuration assets
- Business intelligence assets
Workspaces for assets
You can find any asset in any of the workspaces for which you are a collaborator by searching for it from the global search bar. See Searching for assets across the platform.
What you can do with assets depends on the type of asset and the type of workspace.
Projects Where you collaborate with others to work with data. For example, you can prepare data, analyze data, or create models in projects. You can create all types of assets in projects and you can run operational assets. See Projects.
Catalogs Where you store assets to share with your organization. You can copy assets from catalogs into projects to work with them, or publish assets from projects into the catalog. You can publish all types of data assets and some types of operational assets into a catalog. You can edit asset metadata in a catalog, but you can't run operational assets. See Catalogs.
Deployment spaces Where you deploy models or other assets into production. You copy deployable assets from projects into deployment spaces and then create deployments from those assets. See Deployment spaces.
Data virtualization Where you create virtual tables by combining or segmenting one or more tables. You publish virtual tables as data assets into a catalog. See Virtualizing data.
A data asset points to data.
How you create a data asset depends on where your data is:
- If your data is in a file, you upload the file from your local system to a project, catalog, or deployment space.
- If your data is in a remote data source, you first create a connection asset that defines the connection to that data source. Then you create a data asset by selecting the connection, the path or other structure, and the table or file that contains the data. This type of data asset is called a connected data asset.
Data asset from a file A data asset from a file points to a file that you uploaded from your local system. The file is stored in the object storage container on the IBM Cloud Object Storage instance that is associated with the project, catalog, or deployment space. The contents of the file can include structured data, unstructured textual data, images, and other types of data. You can create a data asset with a file of any format. However, you can do more actions on CSV files than other file type.
Connected data asset A connected data asset points to a table, file, or folder that is accessed through a connection to a remote data source. The connection is defined in the connection asset that is associated with the connected data asset. When you access a connected data asset, the data is dynamically retrieved from the data source.
A folder data asset is special case of a connected data asset. It points to a folder in IBM Cloud Object Storage. You create a folder data asset by specifying the path to the folder and the IBM Cloud Object Storage connection asset. You can view the files and subfolders that share the path with the folder data asset. The files that you can view within the folder data asset are not themselves data assets. For example, you can create a folder data asset for a path that contains news feeds that are continuously updated.
Connection asset A connection asset is considered a type of data asset. It contains the information necessary to create a connection to a data source. You can choose to provide shared credentials for all users who have access to the connection asset to use, or you can specify that each user must enter their personal credentials when they use the connection. The projects and catalogs support many connection types to both IBM and third part data sources.
Operational assets are how you work with data in projects by using tools that prepare data, analyze data, or build models. You create operational assets with tools in projects. For example, a Jupyter notebook is an operational asset that you can create with the notebook editor tool to analyze data.
Running operational assets
When you run operational assets, you use compute resources. Compute resources are measured in capacity unit hours (CUH) and are tracked. Depending on the plans for your services, you can have a limited amount of CUH per month, or incur extra fees if you exceed a set amount of CUH per month.
For many operational assets, you have a choice of the compute environment configuration to use. Typically, larger and faster environment configurations consume compute resources faster. See Environments.
Every time you run an operational asset, it's considered a job. You can monitor and schedule jobs. See Jobs.
Types of operational assets
Many operational assets are provided by the core services. However, some operational assets require other services.
With the Watson Studio, Watson Machine Learning, and Watson Knowledge Catalog services, you can create these types of operational assets without additional services:
- Data Refinery flows to refine data with the Data Refinery tool.
- Jupyter notebooks to analyze data or build models. You use the Jupyter notebook editor.
- SPSS Modeler flows to automate the flow of data through a model with SPSS algorithms in the SPSS Modeler.
- Decision Optimization models to solve scenarios in the Decision Optimization model builder.
- AutoAI experiments to build a model without coding in the AutoAI tool.
- Deep learning experiments to train deep learning models in the Experiment builder.
- Metadata imports to import asset metadata into a project or a catalog.
- Metadata enrichments to enrich data assets in a project with results from profiling and data quality analysis and with business terms.
These operational assets require other services. You can provision each service when you create the first asset that needs it:
- DataStage flows to create data transformation jobs with the DataStage tool. Requires the DataStage service.
- Dashboards to visualize data without code in the Dashboard editor. Requires the Cognos Dashboards service.
If you have the Watson Knowledge Catalog service without Watson Studio, you can create Data Refinery flows, metadata import assets, and metadata enrichment assets.
Configuration assets are reusable templates in projects to configure other assets or jobs.
With the DataStage service, you can create these types of configuration assets:
- DataStage subflows to collect a set of stages and connectors to reuse in DataStage flows.
- Data definitions to specify the column metadata of a data asset to reuse in DataStage flow jobs.
- Parameter sets to collect multiple job parameters with specified values to reuse in jobs.
Asset properties, metadata, and relationships
All assets have common metadata that is visible everywhere. Other asset properties vary by the type of asset and where the asset is.
All assets have common properties that are visible and editable in projects, catalogs, and deployment spaces.
Name Can contain up to 100 characters. Supports multibyte characters. Cannot be empty, contain Unicode control characters, or contain only blank spaces. Asset names do not need to be unique within a project or deployment space. Whether asset names must be unique in a catalog depends on the duplicate handling method set for the catalog.
Description Optional. Can contain up to 245 characters, not including blank spaces. Supports multibyte characters and hyperlinks.
Tags Ungoverned metadata that makes searching for the asset easier. Tags can contain only blank spaces, letters, multibyte characters, numbers, underscores, dashes, and the symbols # and @. Project, catalog, or deployment space collaborators with the admin or editor role can create tags and add them to assets.
Automatically generated or detected metadata can include other information, depending on the asset type, such as, date added, size, created by, last editor, last modified, scheduled, shared, language, model type, and status.
More information in catalogs
Assets in catalogs can have more properties, relationships, and metadata.
Asset privacy Set to public by default. This setting can restrict access to an asset in a catalog when it's set to private. Only the owner and members of the asset can view and use private assets.
Asset owner and asset members By default, the asset owner is the user who added the asset to the catalog. The asset members can view and use the asset when it's marked private.
Governance artifacts Can be assigned automatically, by the asset owner, or by data stewards. Governance artifacts can add metadata and relationships to assets, or mask sensitive data within data assets.
Custom attributes Optional. You can create custom attributes for assets with APIs.
Reviews and ratings All catalog collaborators can rate and review assets.
More information for data assets
Depending on the format of the data in data assets, you view more information when you open the asset.
The path to the data The information necessary to access the data. A connected data asset for a table in a database has a reference to the connection asset for the database, the schema or other path information, and the table name. A data asset for an upload file has a reference to the file location in the object storage container for the project, catalog, or deployment space.
File format The MIME type of a file. Automatically detected.
Data preview A preview of the data, for CSV, Avro, Parquet, Microsoft Excel, PDF, text, and image files.
Data profile A profile of the data, for CSV, Avro, Parquet, Microsoft Word, PDF, text, and HTML files.
Parent topic: Overview of IBM Cloud Pak for Data as a Service