Enrich data assets with information that helps users to find data faster, to decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data. Such information includes, for example, terms
that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.
Data stewards create asset profiles to understand the meaning of data and to assess its quality. Also, they add business context to data by assigning terms and identify relationships between tables. Metadata enrichment automates this process thus
increasing the data steward's productivity.
Data is useful only if its context, content, and quality are trusted. To keep it that way, data must continuously be evaluated and appropriate remediation be taken if required. Data stewards can configure recurring jobs to continuously track changes
to the content and structure of data and then analyze only data that changed.
The information that is added to assets through metadata enrichment also helps to protect data because it can be used in data protection policies to mask data or to restrict access.
Required services
IBM Knowledge Catalog DataStage for advanced key or relationship analysis and advanced profiling
Data format
Tables from relational and nonrelational data sources
Delta Lake and Iceberg tables from certain file-storage connectors.
Files uploaded from the local file system or from file-based connections to the data sources, with these formats: CSV, TSV, Avro, Parquet, Microsoft Excel (xls, xlsm, and xlsx; only the first sheet in a workbook is profiled for files uploaded
from the local file system.) These structured data files are not profiled:
Files within a connected folder asset. Files that are accessible from a connected folder asset are not treated as assets and are not profiled.
Files within an archive file, for example, a .zip file. The archive file is referenced by the data asset and the compressed files are not profiled.
Any; data sets from file-based connections cannot have more than 4,999 columns
Required permissions
To create, manage, and run a metadata enrichment, you must have the Admin or the Editor role in the project, and you must have at least view access to the categories that you want to use in the enrichment.
Also, you must be authorized to access the connections to the data sources of the data assets to be enriched.
If any of these connections are locked, you are asked to enter your personal credentials. This is a one-time step that permanently unlocks the connections for you.
All operations that are run as part of a metadata enrichment require credentials for secure authorization. Typically, your user API key is used to execute such long-running operations without disruption. If credentials are not available when
you create a metadata enrichment or try to run any type of enrichment, you are prompted to create an API key. That API key is then saved as your task credentials. See Managing the user API key.
You can also create, edit, run, or delete metadata enrichments with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.
Metadata enrichment overview
Copy link to section
Enriching data assets involves the following process:
Identify the data assets that you want to enrich.
In a project, create a metadata enrichment asset to configure the enrichment details like the scope and the objective of the enrichment, and the schedule for the enrichment job.
Run the enrichment job.
For each data asset included in the enrichment, work with the results in the metadata enrichment asset:
Identify anomalies and quality issues and take appropriate measures to remediate any issues.
Review generated content such as display names or AI-generated descriptions.
Check term assignments, and evaluate and act on term suggestions.
Manage data class assignments at the column level.
Manage classifications.
Identify and set primary keys and relationships.
Detect overlapping or redundant data.
You can also access the enrichment results and work with them in the profile of each individual asset. See Asset profiles. Detailed quality information is available on an asset's Data quality tab.
Reevaluate the assets in question.
Publish the data assets with the results as required.
You can perform most tasks with APIs instead of the UI. Links to IBM Knowledge Catalog API are listed for each applicable task.
While you can add individual connected assets to a metadata enrichment, metadata enrichment is intended for bulk processing data assets added to the project through metadata import.
To ensure consistent use of enrichment options, you can configure default settings for all metadata enrichment assets in a project. To open the settings page, go to Manage > Metadata enrichment.
Alternatively, you can open an existing metadata enrichment asset and click Default settings.
For workload management, running metadata enrichment jobs can be restricted to job execution windows. A project administrator can define such windows in Manage > Job execution windows.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Some tools perform the same tasks but have different features and levels of automation.
Jupyter notebook editor
Prepare data
Visualize data
Build models
Deploy assets
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
AutoAI
Build models
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
SPSS Modeler
Prepare data
Visualize data
Build models
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Decision Optimization
Build models
Visualize data
Deploy assets
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Data Refinery
Prepare data
Visualize data
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Orchestration Pipelines
Prepare data
Build models
Deploy assets
Automate the model lifecycle, including preparing data, training models, and creating deployments.
RStudio
Prepare data
Build models
Deploy assets
Work with R notebooks and scripts in an integrated development environment.
Federated learning
Build models
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deployments
Deploy assets
Monitor models
Deploy and run your data science and AI solutions in a test or production environment.
Catalogs
Catalog data
Governance
Find and share your data and other assets.
Metadata import
Prepare data
Catalog data
Governance
Import asset metadata from a connection into a project or a catalog.
Metadata enrichment
Prepare data
Catalog data
Governance
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Data quality rules
Prepare data
Governance
Measure and monitor the quality of your data.
Masking flow
Prepare data
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Governance
Governance
Create your business vocabulary to enrich assets and rules to protect data.
Data lineage
Governance
Track data movement and usage for transparency and determining data accuracy.
AI factsheet
Governance
Monitor models
Track AI models from request to production.
DataStage flow
Prepare data
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Data virtualization
Prepare data
Create a virtual table to segment or combine data from one or more tables.
OpenScale
Monitor models
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Data replication
Prepare data
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Master data
Prepare data
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
watsonx.ai Studio
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
watsonx.ai Runtime
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
IBM Knowledge Catalog
Discover, profile, catalog, and share trusted data in your organization.
DataStage
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
Data Virtualization
View, access, manipulate, and analyze your data without moving it.
Watson OpenScale
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Data Replication
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Match360 with Watson
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Manta Data Lineage
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Project
Where you work with data.
> Projects > View all projects
Catalog
Where you find and share assets.
> Catalogs > View all catalogs
Space
Where you deploy and run assets that are ready for testing or production.
> Deployments
Categories
Where you manage governance artifacts.
> Governance > Categories
Data virtualization
Where you virtualize data.
> Data > Data virtualization
Master data
Where you consolidate data into a 360 degree view.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.