0 / 0
Data governance and privacy Tutorial: Trust your data
Data governance and privacy Tutorial: Trust your data

Data governance and privacy Tutorial: Trust your data

Take this tutorial to learn how to prepare trusted data with the Data governance and privacy use case of the data fabric trial. Your goal is to create trusted data assets by enriching your data and running data quality analysis.

Quick start: If you did not already create the sample project for this tutorial, access the Data governance and privacy sample project in the gallery.

The following animated image provides a quick preview of what you’ll accomplish by the end of this tutorial where you will import metadata from an external data source, enrich that data with auto-assigned business terms, view the enriched data, and publish the enriched data to a catalog. Click the image to view a larger image.

Animated image

The story for the tutorial is that Golden Bank has several departments that need access to high-quality customer mortgage data. As a Data Steward on the governance team, you must sort and organize the company's data to provide high-quality and protected data assets that data consumers can easily find in a self-service catalog.

In this tutorial, you will complete these tasks:

  1. Create a catalog.
  2. Create a category.
  3. Add business terms.
  4. Import data into the project.
  5. Enrich the data.
  6. View the results of the metadata enrichment.
  7. Publish assets to a catalog.

If you need help with this tutorial, ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Tip: For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Preview the tutorial

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method as an alternative to following the written steps in this documentation.

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

Video timestamps


  • Watch this short video to see how to use the video picture-in-picture and table of contents.

Prerequisites

Sign up for Cloud Pak for Data as a Service

You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Data governance and privacy use case.

  • If you have an existing Cloud Pak for Data as a Service account, then you can get started with this tutorial. If you have a Lite plan account, only one user per account can run this tutorial.
  • If you don't have a Cloud Pak for Data as a Service account yet, then sign up for a data fabric trial.

Verify the necessary provisioned services

To preview this task, watch the video beginning at 01:05.

Follow these steps to verify or provision the necessary services:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Services > Service instances.

  2. Use the Product drop-down list to determine whether a Watson Knowledge Catalog service instance exists.

  3. If you need to create a Watson Knowledge Catalog service instance, click Add service.

    1. Select Watson Knowledge Catalog.

    2. Select the Lite plan.

    3. Click Create.

  4. Repeat these steps to verify or provision the Cloud Object Storage service.

Checkpoint for Provisioned services Check your progress

The following image shows the provisioned service instances:

Provisioned services

Create the sample project

To preview this task, watch the video beginning at 01:38.

If you did not already create the sample project for this tutorial, follow these steps:

  1. Access the Data governance and privacy sample project in the gallery.

  2. Click Create project.

  3. If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.

  4. Click Create.

  5. Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.

    Note: If this occasion is your first time accessing a project, you see a guided tour asking if you want a tour of projects. For now, click Maybe later.
  6. Click the Assets tab to view the project's assets.

  7. From the Overflow menu Overflow menu at the end of the Banking.csv data asset row, choose Download, and save it to your computer. You'll use that file in a later step.

Note: You might see a guided tour showing the tutorials that are included with this use case. The links in the guided tour will open these tutorial instructions.

Checkpoint for Sample project Check your progress

The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.

Sample project

Task 1: Create a catalog

To preview this task, watch the video beginning at 02:49.

Before you start working with data, create a catalog where you will publish data to share it with your organization. With the Watson Knowledge Catalog Lite plan, you can create only two catalogs. If you already have a catalog, you can skip this step. Otherwise, follow these steps to create a catalog:

Note: If this occasion is your first time accessing a catalog, you see a guided tour asking if you want to tour of catalogs. For now, click Maybe later.

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Catalogs > View all catalogs.

  2. If you see a catalog on the Catalogs page, then skip to Task 2: Create a category. Otherwise, follow these steps to create a new catalog:

  3. Click Create Catalog.

  4. For the Name, copy and paste the catalog name exactly as shown with no leading or trailing spaces:

    Mortgage Approval Catalog
    
  5. If prompted to associate the catalog to a Cloud Object Storage instance, select a Cloud Object Storage from the list.
  1. Select Enforce data protection rules, confirm the selection, and accept the defaults for the other fields.

  2. Click Create.

Checkpoint for Mortgage Approval Catalog Check your progress

The following image shows your catalog. You are now ready to share assets with your organization.

Mortgage Approval Catalog

Task 2: Create a category

To preview this task, watch the video beginning at 03:13.

You need a category to contain the business terms that you’ll import in the next Task. Categories act like folders to organize your governance artifacts and the people who can author and manage those artifacts. Follow these steps to create a category:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Governance > Categories.

  2. Click Add category > New category.

  3. For the name, type Banking.

  4. Click Create.

Checkpoint for Banking category Check your progress

The following image shows the Banking category. You are now ready to import business terms.

Banking category

Task 3: Add business terms

To preview this task, watch the video beginning at 03:41.

Now import business terms into the new category. You’ll use them to enrich your data assets in a later step. Business terms are standardized definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise. Follow these steps to import the business terms from a file:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Governance > Business terms.

  2. Click Add business term > Import from file.

  3. Click Add file.

    1. Select the banking.csv file that you downloaded earlier.

    2. Click Open.

  4. Click Next.

  5. Select Replace all values, and click Import.

  6. Click Go to task to see the draft business terms. If you miss the notification, then from the Cloud Pak for Data navigation menu Navigation menu, choose Governance > Task inbox.

  7. Select the Publish business terms checkbox, and then click Publish. Click Publish to confirm.

  8. From the Cloud Pak for Data navigation menu Navigation menu, choose Governance > Business terms to the published business terms.

Checkpoint for Imported business terms Check your progress

The following image shows the imported business terms. You are now ready to import the data to a project and then enrich with the imported business terms.

Imported business terms

Task 4: Import data to a project

To preview this task, watch the video beginning at 04:47.

The sample project includes a connection to a Db2 Warehouse instance, which contains the mortgage assets. You can import technical metadata that is associated with the data assets into a project or a catalog to inventory, evaluate, and catalog these assets. Technical metadata describes the structure of data objects. Follow these steps to import the data assets:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Projects > View all projects.

  2. Click the Data governance and privacy project.

  3. Click the Assets tab.

  4. Click New asset.

  5. Select Metadata Import for the asset type.

  6. For the name, copy and paste the following text:

    Mortgage data - metadata import
    
  7. Click Next to continue.

  8. On the Select target page, select This project, and click Next to continue.

  9. On the Select scope page, click Select connection.

    1. Select the Data Fabric Trial - Db2 Warehouse connection.

    2. Select the checkbox next to the WKC_MORTGAGE schema, then click the WKC_MORTGAGE schema name.

    3. Select the following tables:

      • COMMERCIAL_CLIENT
      • CREDIT_SCORE
      • HOUSE_PRICE
      • MORTGAGE_APPLICANTS
      • MORTGAGE_APPLICATION
    4. Review the list of assets in the side panel, and then click Select.

  10. Click Next to continue to the schedule.

  11. Click Next to continue to the Advanced Options.

  12. Accept the default values for on the Advanced options page, and click Next to continue to the review.

  13. Review the summary of the import, and click Create. The metadata import job starts.

  14. Click the Refresh Refresh icon icon to watch the status change from Queued to In progress to Imported. When the job run is complete, you see the five assets listed.

Checkpoint for Metadata import asset Check your progress

The following image shows the completed metadata import. Your next task is to enrich the imported data assets with the imported business terms.

Metadata import asset

Task 5: Enrich the imported data

To preview this task, watch the video beginning at 06:07.

You can enrich data assets with information that helps users to find data faster to decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews. Follow these steps to enrich the imported data:

  1. Click the Data governance and privacy project name in the navigation trail.
    Navigation trail

  2. On the Assets tab, click New asset.

  3. Select Metadata Enrichment for the asset type.

  4. For the name, copy and paste the following text:

    Mortgage data - metadata enrichment
    
  5. Click Next to continue.

  6. Click Select data from project.

    1. Select Metadata import.

    2. Click the checkbox next to Mortgage data - metadata import. This asset includes the following assets:

      • COMMERICIAL_CLIENT
      • CREDIT_SCORE
      • HOUSE_PRICE
      • MORTGAGE_APPLICANTS
      • MORTGAGE_APPLICATION
    3. Click Select.

  7. Click Next to continue to the enrichment objective.

  8. Select all enrichment objectives:

    • Profile data
    • Analyze quality
    • Assign terms
  9. For Categories, click Select categories.

    1. Select only [uncategorized] and Banking.

    2. Click Select.

  10. For the Sampling, select Basic.

  11. Click Next to continue to the schedule.

  12. Click Next to continue to the review.

  13. Click Create.

  14. The metadata enrichment asset displays, but the job might take several minutes to complete. Click the Refresh Refresh icon icon to watch the status change from Not analyzed to In progress to Finished. When the job run is complete, you see the five assets listed.

Checkpoint for Metadata enrichment asset Check your progress

The following image shows the completed metadata enrichment. Now you can explore the enriched data assets.

Metadata enrichment asset

Task 6: View the results of the metadata enrichment

To preview this task, watch the video beginning at 07:45.

After Metadata enrichment run is completed, follow these steps to view the enriched data:

  1. From the Mortgage data - metadata enrichment screen, click the Columns tab.

  2. In the list of Columns, locate the EMAIL_ADDRESS column for the MORTGAGE_APPLICANTS asset.

    1. Click the Overflow Overflow menu menu at the end of the EMAIL_ADDRESS for MORTGAGE_APPLICANTS row, and choose View column details.

    2. In the side panel on the Details tab, you see profiling information such as: Format, Frequency distribution, Statistics.

    3. In the side panel, click the Governance tab. This tab includes the data classes and business terms that were auto-assigned during the metadata enrichment. You might also see suggested business terms and data classes, and manually assign them.

    4. To review the suggested terms and manually assign them:

      1. Click Suggested business terms.

      2. For Address, click Assign.

  3. At the end of the EMAIL_ADDRESS column for the MORTGAGE_APPLICANTS asset row, click the Overflow menu Overflow menu, and choose View data quality details.

    1. View the data quality information. Watson Knowledge Catalog automatically generates a data quality score for each column and data asset by analyzing every value in every record according to pre-built dimensions.

    2. Click the X to close the Data quality window.

  4. For the CITY column for the CREDIT_SCORE asset, click the OverflowOverflow menu menu, and choose Mark as reviewed.

  5. Click the Assets tab.

  6. In the list of Assets, for the MORTGAGE_APPLICANTS asset, click the Overflow Overflow menu menu, and choose View asset details.

    1. In the side panel, click the Governance tab to see business term auto assignment.

    2. To manually assign business terms, click the Edit Edit icon icon.

    3. Search for social. If you don't see any results, then make sure that the drop-down list is set to All terms instead of Suggested terms.

    4. Select Social Security Number.

    5. Click Assign.

Checkpoint for Reviewed enriched data assets Check your progress

The following image shows the reviewed and enriched data assets. The next step is to publish the enriched data to a catalog to share with your organization.

Reviewed enriched data assets

Task 7: Publish data to a catalog

To preview this task, watch the video beginning at 09:06.

Now that you have enriched data, you want to publish those data assets to a catalog so data scientists and data analysts can use the enriched data assets. Follow these steps to store the enriched data assets in a catalog for others to have access to the trusted data:

  1. Click the Data governance and privacy project name in the navigation trail.

  2. Click the Assets tab.

  3. Select Data > Data assets.

  4. Select the COMMERICIAL_CLIENT, HOUSE_PRICE, MORTGAGE_APPLICANTS, and MORTGAGE_APPLICATION data assets from the list, and click Publish to catalog.

    1. For the Target catalog, select Mortgage Approval Catalog.

    2. For the MORTGAGE_APPLICANTS asset, click the Edit Edit icon icon, and change the name to:

      MORTGAGE_APPLICANTS_TRUST
      
    3. For the Tag, type the tag, trusted, and click + (plus sign).

    4. Notice that the data asset and the connection asset will be added to the catalog. Click Publish.

  5. Clear all checked assets, then select the checkbox next to the CREDIT_SCORE asset from the list, and click Publish to catalog.

    1. For the Target catalog, select Mortgage Approval Catalog.

    2. For the Tag, type the tag confidential, and click + (plus sign).

    3. For the Tag, type the tag trusted, and click + (plus sign).

    4. Click Publish.

  6. From the Cloud Pak for Data navigation menu Navigation menu, choose Catalogs > View all catalogs.

  7. Click Mortgage Approval Catalog.

  8. In the Filter by > Any tag drop down list, select trusted. Verify that the five data assets were added to the catalog.

Checkpoint for Published assets to the catalog Check your progress

The following image shows the enriched data assets published to a catalog. Now you have trusted data available through your company's catalog.

Published assets to the catalog

As a Data Steward on the governance team, you learned how to sort and organize the company's data to provide high-quality and protected data assets that data consumers can easily find in a self-service catalog.

Next steps

You are now ready to protect your data by creating data protection rules and masking flows to control access to your data. See the Protect your data tutorial.

Learn more

Parent topic: Data fabric tutorials