0 / 0
Designing reference data sets (IBM Knowledge Catalog)

Designing reference data sets (IBM Knowledge Catalog)

When you design a reference data set, you must decide what format of values to use, which code-value pairs constitute the set, and if the set should be related to any other existing sets. You can import the already existing reference data sets and modify them to suit your needs, or create a new reference data set manually.

Reference data is used to categorize other data within enterprise applications and databases. Reference data might be standardized by organizations such as ISO. Reference data can be hierarchical. There might be several reference data sets for the same domain, in which case mappings between the reference data values can be specified.

A reference data set consists of a number of reference data values, where each reference data value must at least have a code and its value defined.

You can either create a new reference data set using the UI, or import its properties from a CSV file. The same holds true for the reference data values - they can be added manually or imported from the CSV file. Note that the CSV files for reference data sets and for reference data values differ.

You can use one of the following methods to import reference data sets:

  • Import one or more reference data sets from a previously exported ZIP file using an API request. The ZIP file contains a CSV file with the properties defining one or more referenece data sets, and, one CSV file for each reference data set listing the reference data values.
  • Import reference data set information from a CSV file using the UI or an API request. Multiple reference data sets can be imported in one file. This method does not include reference data values - they must be imported separately.
  • Import reference data values from a CSV file for a specific set using the UI or an API request.

To learn about these import methods and differences between them, see Import methods for governance artifacts.

You can also use one of the predefined reference data sets. Additionally, each Knowledge Accelerator provides hundreds of reference data sets for a specific industry that you can use. See Reference data sets in Knowledge Accelerators.

The IBM Knowledge Catalog plans have limits on the number and size of reference data sets that you can create.

Properties of reference data sets

Reference data sets have these standard properties that are similar to other governance artifacts.

Property or behavior Supports? Explanation
Must have unique names? Yes Reference data set names must be unique within a category.
Description? Yes Optional. Include a description to help users find this reference data set.
Add relationships to other reference data sets? Yes See Relationships with other reference data sets.
Add relationships to other types of governance artifacts? Yes See Relationships with other types of governance artifacts.
Add relationship to asset? Yes See Asset relationships in catalogs.
Add custom attributes? Yes (API) See Custom attributes and relationships
Add custom relationships? Yes See Custom attributes and relationships
Organize in categories? Yes The primary category for the artifact determines who can view or modify the artifact. See Categories.
Import from a file? Yes See Importing governance artifacts.
Import from a Knowledge Accelerator? Yes
Export to a file? Yes See Exporting governance artifacts.
Managed by workflows? Yes See Workflows.
Specify effective start and end dates? Yes See Effective dates.
Assign a Steward? Yes See Stewards.
Add tags as properties? Yes See Tags.
Predefined artifacts? Yes Physical Locations, Sovereign Locations. See Predefined reference data sets

When creating a new reference data set, you must decide what type to use. The type that you choose determines the format of the value column for reference data values:

Can be a string or multi-line string.
Does not support Boolean, binary, or hexadecimal values.
ISO Date time format.

Properties of reference data values

A reference data set includes a number of reference data values. These values consist of at least the following columns:

A string of up to 255 characters. The code column is always of type Text.
The format of the value is determined by the reference data set type: Text, Number, Date.
Values are optional.
Values are optional. Parent relationship points to another reference data value in the same set. By specifying the parent you can build a reference data values hierarchy tree.

Case-sensitive code

Case-sensitive code was introduced on February 16 2024. For all reference data sets created after that date, the code column is case-sensitive. When you add values to a new reference data set, the code is saved exactly as you type it. The following codes are treated as three unique entries:

  • US
  • Us
  • us

Before, all codes were automatically changed to upper case upon saving, for example:

  • us was saved as US
  • 1pl was saved as 1PL

Note that any reference data sets that were created before this change was introduced remain case-insensitive, and any new values added there will be saved in upper case. These reference data sets are marked with a Case-insensitive tag in the UI.

Custom columns

You might need to capture additional information related to the code such as translations of the value in different languages or other supportive attribute information relevant to your needs. For example, you might have a reference data set of country codes and want to capture some other attributes like prime minister of the country, language, or alternative names. For such purposes, you can add custom columns in the following ways using the UI:

  • You can manually define custom columns when creating a new reference data set. Note that custom columns cannot be modified or added manually once the reference data set is created.
  • You can use the CSV file import to create a new reference data set and map the columns from the file to new custom colums.
  • You can import or reimport values from a CSV file into an existing reference data set and use column mapping to create new custom columns.

For more information, see Importing custom columns.

Composite keys

Reference data values in a set are identified by a unique code. However, you might need to identify data by using more than one column. A composite key is a combination of the code column and up to 5 custom columns in a reference data set. Composite key is used to uniquely identify each reference data value. With a composite key, the values in the code column no longer need to be unique. Uniqueness is only guaranteed when the values of all the specified columns are combined - the values in individual columns are not necessarily unique.

When you define a composite key for the set, each reference data value in such a set is identified in the system by a physical representation of the composite key, which is a concatenation of the code column and the composite key custom columns in the order that was specified when the set was initially created. The values are delimited by |, for example: CODE|CC1|CC2|CC3. This physical representation is used to identify reference data values in the system (for example, to track relationships) and it cannot be changed.

To learn how to create reference data sets with composite keys, see Creating reference data sets with composite keys.

Relationships with other types of governance artifacts

Relationships between data classes and reference data sets: Data classes can include reference data sets in data matching methods. The reference data set is then used to evaluate whether columns in data assets meet the criteria for assigning the data class. See Adding data matching to data classes.

Relationships with business terms: You can assign business terms to the reference data set or to a specific reference data value to further define the meaning of the code.

Relationships with other reference data sets

You can create hierarchical relationships for your reference data sets that establish logical connections between them. You can also create relationships between the values in reference data sets. See Relationships between reference data sets.

Working with reference data sets

You can work with a reference data set in the following ways:

  • Click (Download icon) to download reference data values in a CSV file. You can then edit the CSV file locally, which often provides more speed and flexibility in your work.
  • Import values from a CSV file by selecting Upload file from the action menu next to the reference data name and following the instructions.
  • Create your own values by clicking (Add icon).
  • Edit a reference data value (its value, description, or parent value) by clicking (Edit icon).
  • Rearrange how columns are displayed in the reference data value view by clicking Manage columns.
  • Delete reference data values by clicking Delete value. Mark multiple values for deletion with the Ctrl key.
  • Assign Related artifacts to the reference data set, such as business terms or classifications.
  • Add related business terms to a chosen value.
  • Add related values to a chosen value. You can relate values from the same set or between different sets. You can import them from a CSV file by selecting Upload related values from the action menu next to the reference data name and following the instructions.

To learn more about the tasks common to all governance artifacts, see Managing governance artifacts

Learn more

Parent topic: Reference data

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more