Profiling data assets
The profile of a data asset includes generated metadata and statistics about its content. An asset profile helps you understand what actions to take to improve the data quality. You can see the profile on the asset's Profile page in a catalog or in a project. All catalog or project members can see data asset profiles.
Unstructured data assets
Profiles for unstructured data assets are created automatically when you add such assets to a catalog, regardless of whether policies are enforced, or a project. You cannot manually create or update a profile for an unstructured data asset. In some cases, profile creation is triggered anew if an asset was previously processed by the analysis service, but profiling did not complete or failed altogether:
- When the asset metadata is updated, the existing asset is profiled again.
- When this asset is in a project and you publish it to a catalog, the asset in the catalog is profiled.
- When this asset is in a catalog and you add it to a project, the asset in the project is profiled.
However, data assets with profiles that were created by IBM Watson Natural Language Understanding are not profiled again.
Structured data assets
Profiles for structured data assets are created automatically in governed catalogs unless you disabled automatic profiling, the asset comes from a connection that is configured to use personal credentials, or the asset was profiled through metadata enrichment before it was published. You can manually create profiles for structured data assets in these cases:
- In governed catalogs if the asset wasn't profiled before
- In ungoverned catalogs
- In projects
Profile updates can also be triggered manually.
In projects, you can create profiles for individual assets one at a time from each asset's Profile tab. You can also create profiles for multiple data assets in parallel by creating and running a metadata enrichment asset.
If you manually update a profile of a data asset that is included in a metadata enrichment, the profile and analysis information is also reflected in the respective enrichment results.
During profiling, columns and data quality are analyzed:
Profile and classify data, and find inconsistencies and anomalies. Column analysis includes the following tasks:
- Compute statistics about the data of each analyzed column. - Compute data types for columns and data types distribution. - Computes data formats for columns and formats distribution. - Classify the data and compute data class candidates for columns. - Capture frequency distributions.
The profile shows the frequency of the inferred data classes and statistics about the data for each column. Data classes describe the contents of the data in the column: for example, city, account number, or credit card number. Data classes are necessary to restrict access to data or mask data with data protection rules. The data classes appear for each column on the asset's Overview page and on the Profile page.
Data quality analysis
Identify the structure, content, and overall quality of your data. Data quality analysis includes the following tasks:
- Evaluate quality dimensions to identify data quality problems. - Compute a data quality score for data assets and columns.
The profile provides an overall quality score for the data asset and a separate quality score for each column. Data quality scores for individual columns in the data asset are computed based on quality dimensions. The overall quality score for the entire data asset is the average of the scores for all columns.
To prevent records with multiple quality issues to unnecessarily weigh down the data quality score, values that are identified with more than one issue do not weigh differently against the quality score as values with only one.
You must have the Admin or Editor role in the project or catalog to create or update a profile for a structured data asset.
To manually create a profile for a structured data asset:
- Go to the asset's Profile page. If necessary, you are prompted to enter your personal credentials for the locked data connections.
- Optional. Click Select data classes, choose which data classes to include in the profile, and click Apply.
- Click Create profile.
You can update an existing profile for a structured data asset when the data changes or when you want to change the data classes to include in the profile. If you exclude a data class that was previously assigned to a column, the updated profile shows Class excluded (from profile) for the respective column unless a different data class was assigned. You will also see Class excluded (from profile) for any columns where you don't have access to the assigned data class.
Note: When you publish a structured data asset from a project to a catalog or add such asset to a project from a catalog where the project and the catalog belong to different accounts, the asset profile is not copied because the set of available data classes might be different. Therefore, you must create a new profile. If you publish to a governed catalog, profiling is started automatically.
- Asset profiles
- Predefined data classes
- Data quality dimensions
- Data quality score
- Managing metadata enrichment
Parent topic: Catalog assets