0 / 0
Reviewing and updating enrichment results in an external program
Last updated: Oct 23, 2024
Reviewing and updating enrichment results in an external program

You want to use a familiar spreadsheet environment to review and manage data class and term assignments for the data assets in the scope of a single metadata enrichment.

Requirements and restrictions

For managing data class and term assignments in a spreadsheet, the following requirements and restrictions exist.

Prerequisite configuration

The Review metadata Office add-in must be deployed in your organization and you must have a copy of the Microsoft Excel workbook template that is provided with the add-in.

A Microsoft admin can download the manifest.xml file and the Review metadata - IBM Knowledge Catalog.xlsx workbook template from the metadata-enrichment folder in the IBM Knowledge Catalog samples GitHub repository at: https://github.com/IBM/knowledge-catalog-samples

Instructions for tailoring the manifest.xml are provided in the readme file that accompanies the manifest file and the Excel template.

The admin must deploy and publish the add-in as described in the Microsoft documentation Deploy and publish Office Add-ins.

You must activate the Review metadata Excel add-in. For information about how to do that, check the documentation that applies to your version of Excel.

Restrictions

Before you start working with the workbook and the add-in, review the information in Issues with the Microsoft Excel add-in.

What the workbook looks like

The workbooks consists of 5 protected sheets:

Review metadata workbook
Sheet Content
Data assets Columns:
• Connection
• Data path
• Data asset
• Column
• Type
• Description
• Assigned / suggested data classes
• Data class
• Assigned / suggested business terms
• Business term columns. By default, 3 columns are provided. You can add further columns. See Reviewing and updating assignments.
Business terms Columns:
• Name
• Abbr. A list of the abbreviations defined for the term.
• Category path
• Distinctive name. If multiple terms with the same name exist, the name and category path are listed here to help distinguish the terms.
• Description
• Secondary categories
• Tags
• Classifications
• Effective start
• Effective end
Data classes Columns:
• Name
• Category path
• Distinctive name. If multiple data classes with the same name exist, the name and category path are listed here to help distinguish the data classes.
• Description
• Secondary categories
• Tags
• Classifications
• Effective start
• Effective end
Categories Columns:
• Name
• Path
• Description
• Tags
• Classifications
Knowledge Catalog • Download information
• Upload information

Retrieving data from Cloud Pak for Data

To load the data into the workbook:

  1. Create a copy of the workbook template for each metadata enrichment that you want to work on. Give each copy a meaningful name, for example, include the project name and the metadata enrichment name. Thus, you can easily identify where the data belongs.

  2. Open a workbook. If you already activated the add-in, the Excel Home ribbon contains the Review metadata button (IBM Knowledge Catalog Review metadata). If you don't see that button, activate the add-in now by following the instructions that apply to your version of Excel.

    To open the add-in task pane, click the Review metadata button.

  3. Log in with your Cloud Pak for Data credentials.

  4. Retrieve governance artifacts and data assets. You can download this information in 2 separate steps. However, you must download the governance artifacts before you download the data assets. Otherwise, the assignments can't be displayed.

    • Retrieve governance artifacts

      Add information about all data classes and business terms that are defined in Cloud Pak for Data to the respective sheets in the workbook. Also, add information about the categories to which the data classes and terms belong.

    • Retrieve data assets

      Select a project and a metadata enrichment, and download the data assets that are in the scope of the selected metadata enrichment. If you don't see a newly created project in the projects list, reload the add-in.

    Important: To avoid any potential data mismatches, always use a new workbook for data retrieval even if you retrieve data from a metadata enrichment on which you worked previously.

After you successfully retrieve the information, the Knowledge Catalog sheet is populated with this information:

  • The Cloud Pak for Data hostname
  • The names of the project and the metadata enrichment from which the data was loaded. The spreadsheet will always reflect the display names as of the initial retrieval. They are not updated when the name of the project or the metadata enrichment is changed in IBM Knowledge Catalog. However, this does not impact the updates on upload because these are done by using the resource IDs, which are immutable.
  • The date and time when the governance artifacts and the data assets were downloaded

In addition, the upload option is enabled in the add-in task pane.

The Business terms, Data classes, and Categories sheets contain the information listed in What the workbook looks like.

The Data assets sheet contains an alphabetical list of the data assets followed by an alphabetical list of all columns. The columns of the Data assets sheet are populated as follows:

Sheet column Editable Data asset Asset column
Connection No Connection name Connection name
Data path No Schema Schema
Data asset No Asset name Asset name
Column No
Column name
Type No Set to Dataset Set to Field
Description Yes Any description that might be available for the data asset Any description that might be available for the asset column
Assigned / suggested data classes No Assigned and suggested data classes
An assigned data class is also listed in the Data class column.
Assigned and suggested data classes
An assigned data class is also listed in the Data class column.
Data class No for data asset
Yes for asset columns
Assigned data class Assigned data class
Assigned / suggested business terms No Assigned and suggested terms
Assigned terms are also listed in separate Business term columns.
Assigned and suggested terms
Assigned terms are also listed in separate Business term columns.
Business term
The number of columns can vary. The default is 3 columns. If the data asset or asset column has more terms assigned, columns are added as needed. You can add further columns as required. See Reviewing and updating assignments.
Yes Assigned term Assigned term

Reviewing and updating assignments

To review and update the metadata:

  1. Check the Data class and Business terms columns.

  2. Leave correct assignments unchanged. Replace or remove incorrect assignments. For business terms, you can add as many as required. Each term must be in a separate column. By default, the sheet contains 3 columns for business terms. You can add extra columns as follows:

    1. Unprotect the Data asset sheet.
    2. Select the last Business term column.
    3. Right-click anywhere in that column and select Insert.
    4. Optional: Add the column header Business term.
    5. Protect the sheet again.

    You can now use this new column to assign business terms.

Uploading the reviewed results

When you completed your review, upload the updated metadata to Cloud Pak for Data. You don't have to the save the workbook before you start the upload.

The data that you upload overwrites the enrichment results in the project. All previously assigned data classes are unassigned and marked as suggestions. Then, data class and business term assignments are updated as specified in the spreadsheet. Descriptions in the spreadsheet overwrite the asset and column descriptions in the project. All columns and assets are marked as reviewed.

Learn more

Parent topic: Managing metadata enrichment

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more