0 / 0
Importing files for reference data sets

Importing files for reference data sets

You can create reference data sets outside of the catalog in CSV (comma-separated value) format, then import or reimport values. You can also import value-mapping relationships between values in multiple reference data sets.

Use one of the following methods to import reference data sets:

  • Import one or more reference data sets from a previously exported ZIP file using an API request.

    When importing reference data sets in a ZIP file as described in Importing all governance artifacts from an instance with a ZIP file , you must always use merge_option=all in the API call.

  • Import reference data set information from a CSV file using the UI or an API request. Multiple reference data sets can be imported in one file. This method does not include reference data values - they must be imported separately.

  • Import the reference data values from a CSV file for a specific set using the UI or an API request.

  • Import related reference data values from a CSV file into an existing reference data set using UI or an API request.

Note: The maximum number of values that can be imported is 5000.

The IBM Knowledge Catalog plans have limits on the number and size of reference data sets that you can create.

When you import or export a reference data set in a ZIP file, the file contains a CSV file which defines the reference data sets included in that ZIP file, and, in a separate folder, one CSV file per set with its reference data values. These CSV files have a different format.

The reference data set CSV file defines the properties of each included set, such as its artifact ID, a name, description, the category it belongs to and reference data set type. It might have the following format:

artifact_id,Name,Artifact Type,Category,Description,Secondary Categories,Related Terms,Data Set Type
026df326-74f2-4dce-8d6b-7d2f36b09d98,Customer Non Performing Loan Status,reference_data,54b9bd8a-ddfb-4512-8d46-e26d2926981e,Distinguishes between Customers according to their number of outstanding non-performing loans.,ecf8fade-4956-4e92-9a56-308949f0cb58,a9a63e90-94df-4b00-95ec-a951189d2183,TEXT
0544a0b7-07b7-4509-8cbe-22e36caa218b,Household Life Cycle Status,reference_data,8a72919e-8c40-4a73-b190-4803deb2160d,Distinguishes between Households according to the state of existence of the Household.,ecf8fade-4956-4e92-9a56-308949f0cb58,3a6f0d98-64fc-4166-b3e4-7f2ebcbeac9f,TEXT
07cf348c-76a3-482c-9614-2b89edabbaaf,Financial Legal Status,reference_data,54b9bd8a-ddfb-4512-8d46-e26d2926981e,"Distinguishes between Individuals or Organizations according to whether they are undergoing proceedings that affect their financial standing; for example, (US) Chapter 11 status, In Liquidation, In Receivership, Bankrupt.",ecf8fade-4956-4e92-9a56-308949f0cb58,03976617-abe9-4e5a-88cf-57193b22cce1,TEXT

The CSV file for reference data values at a minimum consists of the following columns that are defined for the reference data set:

  1. Code
  2. Value
  3. Description (optional)
  4. Parent (optional)

It might also contain other information, such as related reference data values, or custom columns.

For example, the first several rows of a reference data set for NAICS codes look like this:

11,Agriculture, Forestry, Fishing and Hunting
111,Crop Production,Crop Produ,11
1111,Oilseed and Grain Farming,111
11111,Soybean Farming,Soybean Farming,1111
111110,Soybean Farming,Soybean Farming,1111
11112,Oilseed (except Soybean) Farming,Oilseed (except Soybean) Farming,1111
111120,Oilseed (except Soybean) Farming,Oilseed (except Soybean) Farming,1111

Read more about formatting the CSV files in CSV file format for importing governance artifacts. The different import methods are described in Import methods for governance artifacts.

The following sections describe how to import CSV files for reference data sets using the UI. In the UI, you can see the progress bar for the import task, you can also view the summary of the import, with error descriptions. Note that when importing reference data values, the number of saved values listed in the import summary might be different from the number of actually imported values. This is because all the duplicates from the CSV file are counted as saved values, while only one of these duplicates is eventually imported (depending on the duplicate handling method that you choose).

Importing files with reference data values

In the selected reference data set, click Upload file to select a CSV file from which to import values:

  • Rows in the file with existing codes update existing rows.
  • Rows with new codes are added.

When you import values from the CSV file, you map the columns from the file to any default or custom columns. If the columns do not exist, you can create them.

Importing custom columns

Values in a reference data set by default have the following columns: code, value, description (optional), and parent (optional). However, you might want to capture additional information related to the code in custom columns. For example, you might have a reference data set of country codes and want to capture some other attributes like prime minister of the country, language, or alternative names.

You can add custom columns during the creation of a reference data set from a CSV file where such columns are defined, or when importing or reimporting values from a CSV file into an existing reference data set.

See the following example of a portion of a CSV file with custom columns. This file can be used to import or reimport values.

code,value,description,Capital City,National Day,Official Language,Population
AFG,Afghanistan,The Islamic Republic of Afghanistan,Kabul,19/08/1919,دری,37200000
ALA,Åland Islands,Åland,Mariehamn,07/05/1920,svɛ̂nːska,28007
ALB,Albania,The Republic of Albania,Tirana,28/11/1912,Albanian,2850000
DZA,Algeria,The People's Democratic Republic of Algeria,Algiers,05/07/1962,الجزائر‎,42200000
ASM,American Samoa,The Territory of American Samoa,Pago Pago,14/06/1889,English,55465

To import custom columns into the reference data set:

  1. Open the reference data set and click Upload file. Provide the CSV file and click Next.
  2. Map the columns from your file to the default or exisitng columns in the reference data set. To add new custom column, from the Target column drop-down list, select + Add custom column (Optional).
  3. Provide column name, description, maximum characters, and specify whether to validate the column values against the code of another reference data set.
  4. You can mark the custom column to be part of the composite key. Custom columns that are part of the composite key are mandatory by default.
  5. Repeat the steps for each column from the CSV file that you want to map.
  6. When you map all columns, review the information, rearrange the columns if required, and click Save.

Importing related values

You can import relationships between values in reference data sets using a CSV file. The values in the source and target reference data sets must exist in the system to create relationships, otherwise the relationships are not imported. First import both related reference data sets, and then use the same CSV files to import relationships.

See the following example of a CSV file with value mappings.

IND,India,Asian country,"KA,AP,MP,DL,GJ",ASIA
USA,America,North American Country,"CA,FL,NY,TX",NorthAmerica
GER,Germany,European Country,,Europe

To add related values by importing a CSV file, complete the following steps:

  1. Open the reference data set for which you want to add related values.
  2. From the three dot menu click Upload related values.
  3. Add the file that you want to upload by dragging it to the Import related values window or by browsing for the file. Then, click Next.
  4. Choose the column from your file that has the code values that you want to map to one or more other code values.
  5. Choose the type of relationship (one-to-one or one-to-many) that you want between each value and related value.
  6. Choose the column from your file that has the related codes of the reference values that you want to map to, and which reference data set the related values belong to.
  7. Click Save.

Note: When working with the CSV files with reference data values defined by a composite key, ensure that the parent relationship between the values, and the single- and multi- mapped related values are represented as concatenation of CODE and composite key custom column values delimited by |. Code always goes first, then composite key column values, for example: CODE|CC1|CC2|CC3.

Learn more

Parent topic: Reference data

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more