Managing metadata enrichment
Data assets can be enriched with information that helps users to find data faster, to decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.
Data stewards create asset profiles to understand the meaning of data and to assess its quality. Also, they add business context to data by assigning terms. Metadata enrichment automates this process thus increasing the data steward's productivity.
Data is useful only if its context, content, and quality are trusted. To keep it that way, data must continuously be evaluated and appropriate remediation be taken if required. Data stewards can configure recurring jobs to continuously track changes to the content and structure of data and then analyze only data that changed.
The information that is added to assets through metadata enrichment also helps to protect data because it can be used in data protection policies to mask data or to restrict access.
- Required services
IBM Knowledge Catalog
- Data format
Tables from relational and nonrelational data sources
Files uploaded from the local file system or from file-based connections to the data sources, with these formats: CSV, TSV, Avro, Parquet, Microsoft Excel (xls, xlsm, and xlsx; only the first sheet in a workbook is profiled for files uploaded from the local file system.)
These structured data files are not profiled:
- Files within a connected folder asset. Files that are accessible from a connected folder asset are not treated as assets and are not profiled.
- Files within an archive file. The archive file is referenced by the data asset and the compressed files are not profiled.
You can enrich data assets from the data sources listed in Supported data sources for metadata import, metadata enrichment, and data quality rules.
- Data size
Any; data sets from file-based connections cannot have more than 4,999 columns
- Required permissions
To create, manage, and run a metadata enrichment, you must have the Admin or the Editor role in the project, and you must have at least view access to the categories that you want to use in the enrichment. Also, you must be authorized to access the connections to the data sources of the data assets to be enriched.
You can also create, edit, run, or delete metadata enrichments with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.
Metadata enrichment overview
Enriching data assets involves the following process:
Identify the data assets that you want to enrich.
In a project, create a metadata enrichment asset to configure the enrichment details like the scope and the objective of the enrichment, and the schedule for the enrichment job.
Run the enrichment job.
For each data asset included in the enrichment, work with the results in the metadata enrichment asset:
- Identify anomalies and quality issues and take appropriate measures to remediate any issues.
- Check term assignments, and evaluate and act on term suggestions.
- Manage data class assignments at the column level.
You can also access the enrichment results and work with them in the profile of each individual asset. See Asset profiles.
Reevaluate the assets in question.
You can perform most tasks with APIs instead of the UI. Links to Watson Data API are listed for each applicable task.
While you can add individual connected assets to a metadata enrichment, metadata enrichment is intended for bulk processing data assets added to the project through metadata import.
To ensure consistent use of enrichment options, you can configure default settings for all metadata enrichment assets in a project. You must have at least one metadata enrichment asset in your project to be able to configure those settings. To open the settings page, open an existing metadata enrichment asset and click Default settings.
Parent topic: Data curation