0 / 0
Identifying primary keys

Identifying primary keys

Run a primary key analysis to identify single or compound primary keys in the data assets in a relational database. A primary key uniquely identifies each record in a data asset. Each data asset can have only one primary key.

Validate the primary keys that are already defined in your data and identify columns that are candidates for primary keys. A primary key must be unique and cannot contain null values.

This type of analysis identifies single-column primary keys for a data asset by locating a column that qualifies as a unique identifier for your data. To identify a multi-column or compound primary key, the analysis determines which combinations of columns contain unique values, making them candidates for primary keys. The values for different column combinations are evaluated and the uniqueness of the concatenated values is computed. Only adjacent columns are checked.

Only columns and combinations of columns with a confidence that exceeds the defined threshold, either in the enrichment settings or for an individual run, are suggested as primary keys. If a primary key is already defined in your data, this key is automatically assigned. This information must explicitly be included in a metadata import. You can still use the results of a primary key analysis to validate defined primary keys.

Prerequisites

The DataStage service must be deployed for running primary key analysis.

All operations that are run as part of a metadata enrichment require credentials for secure authorization. Typically, your user API key is used to execute such long-running operations without disruption. If credentials are not available when you try to run a primary key analysis, you are prompted to create an API key. That API key is then saved as your task credentials. See Managing the user API key.

If any of the connections to the data sources are locked, you are asked to enter your personal credentials. This is a one-time step that permanently unlocks the connections for you.

You can run primary key analysis without creating asset profiles first. The analysis does not rely on profile information such as the frequency distribution of column values. Instead, column data is analyzed directly.

Setting primary keys manually

If your data was already analyzed with the Profile data option, you can set a primary key without running a key or relationship analysis, or you can change an assignment. Open an asset's column view by clicking the asset name or by selecting View columns from the overflow menu.

  • To set a single-column primary key, select a column and select More > Assign as primary key from the toolbar.
  • To set a multi-column primary key, select the columns that you want to make up the key and select More > Assign as compound primary key from the toolbar.

When you replace a primary key, any relationships for the old primary key are unassigned. You must set new relationships.

Running a primary key analysis

To run a primary key analysis for one or more assets:

  1. Open the metadata enrichment asset.

  2. On the Assets tab, select assets as required.

  3. Select Enrich > Identify primary keys from the toolbar.

    By default, this analysis is run on a single column. However, a single column might not satisfy the criteria for a primary key. In this case, you can select to run the analysis on multiple columns to create a compound primary key. The analysis concatenates the values for different column combinations, computes the uniqueness of the concatenated values, and thus determines which combinations of columns contain unique values, making them primary-key candidates. Such analysis can be very time-consuming depending on the size of a data asset.

  4. Click Run. You are notified when the analysis is complete but you might need to refresh the enrichment results to see any new primary keys.

By default, all primary keys that are identified during analysis have candidate status, which means that they are displayed as suggested keys, unless your data already contains defined primary keys. In this case, these columns are assigned as primary keys during analysis.

Running another primary key analysis deletes all previously suggested primary keys that were not found during the current analysis. For example, running a single-column primary key analysis removes all compound primary key candidates that were identified in a previous analysis.

Checking the analysis results and assigning keys

Review the keys for an asset on the Keys tab in the asset's details panel. If your data already contains a primary key, it is shown here as assigned. In addition, any suggested keys are listed here. To change the assigned primary key, click the edit icon edit icon.

The complete list of keys provides the following information:

  • The name of a column that is suggested or assigned as primary key or a list of column names for a compound primary key
  • The number of columns that make up the assigned or candidate primary key
  • The data type of the column or columns that make up the assigned or candidate primary key
  • The confidence score for assigning or suggesting a key
  • The number and percentage of distinct, unique, and null values for each key
  • The status of the key, which can be Assigned or Suggested
  • The date on which the column was last modified

To assign a primary key or replace the assigned key, select Assign from the key's overflow menu. If you replace the existing primary key, any relationships for the old primary key are unassigned. You can then set new relationships.

Unassigning or removing keys

You can unassign an assigned key. In this case, any relationships for this primary key are also unassigned and the status of the key is set to suggested. You can then assign a new primary key and set corresponding relationships. This key might be suggested or assigned again when you rerun primary key analysis.

You can also entirely remove an assigned or suggested key. In this case, that column is not considered as primary key again when you rerun primary key analysis. Any relationships for this key are unassigned.

Accessing job run details

While you can access the job details for general metadata enrichment from the side panel, you must go to the project's Jobs tab to see the jobs for primary key analysis. }The job is of the type Key Analysis for Metadata Enrichment Assets. The job name follows the pattern metadata_enrichment_name (PK Detection).

Learn more

Parent topic: Enriching your data assets

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more