0 / 0
Data governance and privacy Tutorial: Trust your data
Data governance and privacy Tutorial: Trust your data

Data governance and privacy Tutorial: Trust your data

Take this tutorial to learn how to prepare trusted data with the Data governance and privacy use case of the data fabric trial. Your goal is to create trusted data assets by enriching your data and running data quality analysis.

The following animated image provides a quick preview of what you’ll accomplish by the end of this tutorial where you will import metadata from an external data source, enrich that data with auto-assigned business terms, view the enriched data, and publish the enriched data to a catalog.

Data governance and privacy: Trust your data preview

The story for the tutorial is that Golden Bank has several departments that need access to high-quality customer mortgage data. As a Data Steward on the governance team, you must sort and organize the company's data to provide high-quality and protected data assets that data consumers can easily find in a self-service catalog.

In this tutorial, you will complete these tasks:

  1. Create a catalog.
  2. Create a category.
  3. Add business terms.
  4. Import data into the project.
  5. Enrich the data.
  6. View the results of the metadata enrichment.
  7. Publish assets to a catalog.

If you need help with this tutorial, ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Tip: For the optimal experience completing this tutorial, open Cloud Pak for Data as a Service in one browser tab, and keep this tutorial page open in another browser tab to switch easily between the two applications.

Preview the tutorial

Watch Video Watch this video to preview the steps in this tutorial.

This video provides a visual method as an alternative to following the written steps in this documentation.

Prerequisites

Sign up for Cloud Pak for Data as a Service

You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Data governance and privacy use case. If you have a Lite plan account, only one user per account can run this tutorial.

You can sign up for Cloud Pak for Data as a Service in any of these ways:

Provision the necessary services

Watch Video To preview this task, watch the video beginning at 01:05.

Follow these steps to verify or provision the necessary services:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Services > Service instances.
  2. Use the Product drop-down list to determine whether a Watson Knowledge Catalog service instance exists.
  3. If you need to create a Watson Knowledge Catalog service instance, click Add service.
    1. Select Watson Knowledge Catalog.
    2. Select the Lite plan.
    3. Click Create.
  4. Repeat these steps to verify or provision the Cloud Object Storage service.

Checkpoint Check your progress

The following image shows the provisioned service instances:

Provisioned services

Create the sample project

Watch VideoTo preview this task, watch the video beginning at 01:38.

If you did not already create the sample project for this tutorial, follow these steps:

  1. Access the Data governance and privacy guided tutorial sample project in the gallery.
  2. Click Create project.
  3. If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
  4. Click Create.
  5. Click View new project to verify that the project and assets were created successfully. Note: If this occasion is your first time accessing a project, you see a guided tour asking if you want to tour of projects. For now, click Maybe later.
  6. Click the Assets tab.
  7. From the Overflow menu Overflow menu at the end of the Banking.csv data asset row, choose Download, and save it to your computer. You'll use that file in a later step.

Checkpoint Check your progress

The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.

Sample project

Tip: If you encounter a guided tour while completing this tutorial in the Cloud Pak for Data as a Service user interface, click Maybe later.

Task 1: Create a catalog

Watch Video To preview this task, watch the video beginning at 02:33.

Before you start working with data, create a catalog where you will publish data to share it with your organization. With the Watson Knowledge Catalog Lite plan, you can create only two catalogs. If you already have a catalog, you can skip this step. Otherwise, follow these steps to create a catalog:

Note: If this occasion is your first time accessing a catalog, you see a guided tour asking if you want to tour of catalogs. For now, click Maybe later.

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Catalogs > View all catalogs.
  2. If you see a catalog on the Catalogs page, then skip to Task 2: Create a category. Otherwise, follow these steps to create a new catalog:
  3. Click Create Catalog.
  4. For the Name, copy and paste the catalog name exactly as shown with no leading or trailing spaces:
    Mortgage Approval Catalog
    
  5. If prompted to associate the catalog to a Cloud Object Storage instance, select a Cloud Object Storage from the list.
  6. Select Enforce data policies, confirm the selection, and accept the defaults for the other fields.
  7. Click Create.

Checkpoint Check your progress

The following image shows your catalog. You are now ready to share assets with your organization.

Mortgage Approval Catalog

Task 2: Create a category

Watch Video To preview this task, watch the video beginning at 02:59.

You need a category to contain the business terms that you’ll import in the next Task. Categories act like folders to organize your governance artifacts and the people who can author and manage those artifacts. Follow these steps to create a category:

  1. From the Cloud Pak for Data as a Service navigation menu Navigation menu, choose Governance > Categories.
  2. Click Add category > New category.
  3. For the name, type Banking.
  4. Click Create.

Checkpoint Check your progress

The following image shows the Banking category. You are now ready to import business terms.

Banking category

Task 3: Add business terms

Watch Video To preview this task, watch the video beginning at 03:25.

Now import business terms into the new category. You’ll use them to enrich your data assets in a later step. Business terms are standardized definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise. Follow these steps to import the business terms from a file:

  1. From the Cloud Pak for Data as a Service navigation menu Navigation menu, choose Governance > Business terms.
  2. Click Add business term > Import from file.
  3. Click Add file.
    1. Select the banking.csv file that you downloaded earlier.
    2. Click Open.
  4. Click Next.
  5. Select Replace all values, and click Import.
  6. Click Go to task to see the draft business terms. If you miss the notification, then from the Cloud Pak for Data as a Service navigation menu Navigation menu, choose Governance > Task inbox.
  7. Select the Publish business terms checkbox, and then click Publish. Click Publish to confirm.

Checkpoint Check your progress

The following image shows the imported business terms. You are now ready to import the data to a project and then enrich with the imported business terms.

Imported business terms

Task 4: Import data to a project

Watch Video To preview this task, watch the video beginning at 04:30.

The sample project includes a connection to a Db2 Warehouse instance, which contains the mortgage assets. You can import technical metadata that is associated with the data assets into a project or a catalog to inventory, evaluate, and catalog these assets. Technical metadata describes the structure of data objects. Follow these steps to import the data assets:

  1. From the Cloud Pak for Data navigation menu Navigation menu, choose Projects > View all projects.
  2. Click the Data governance and privacy project.
  3. Click the Assets tab.
  4. Click New asset.
  5. Scroll down to select Metadata Import.
  6. For the name, type Mortgage data - metadata import.
  7. Click Next to continue.
  8. For Select target, select This project, and click Next to continue.
  9. For Select scope, click Select connection.
    1. Select the Data Fabric Trial - Db2 Warehouse connection.
    2. Click the WKC_MORTGAGE schema.
    3. Select the following tables:
      • COMMERICIAL_CLIENT
      • CREDIT_SCORE
      • HOUSE_PRICE
      • MORTGAGE_APPLICANTS
      • MORTGAGE_APPLICATION
    4. Review the list of tables in the side panel, and then click Select.
  10. Click Next to continue to the schedule.
  11. Click Next to continue to the review.
  12. Review the summary of the import, and click Create. The metadata import job starts.
  13. Click the Refresh Refresh icon icon to watch the status change from Queued to In progress to Imported. When the job run is complete, you see the five assets listed.

Checkpoint Check your progress

The following image shows the completed metadata import. Your next task is to enrich the imported data assets with the imported business terms.

Metadata import asset

Task 5: Enrich the imported data

Watch Video To preview this task, watch the video beginning at 05:51.

You can enrich data assets with information that helps users to find data faster to decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews. Follow these steps to enrich the imported data:

  1. Click the Data governance and privacy project name in the navigation trail.
    Navigation trail
  2. On the Assets tab, click New asset.
  3. Select Metadata Enrichment.
  4. For the name, type Mortgage data - metadata enrichment.
  5. Click Next to continue.
  6. Click Select data from project.
    1. Select Metadata import.
    2. Select Mortgage data - metadata import which includes the following assets:
      • COMMERICIAL_CLIENT
      • CREDIT_SCORE
      • HOUSE_PRICE
      • MORTGAGE_APPLICANTS
      • MORTGAGE_APPLICATION
    3. Click Select.
  7. Click Next to continue to the enrichment objective.
  8. Select all enrichment objectives:
    • Profile data
    • Analyze quality
    • Assign terms
  9. Click Select categories.
    1. Select [uncategorized] and Banking.
    2. Click Select.
  10. For the Sampling, select Basic.
  11. Click Next to continue to the schedule.
  12. Click Next to continue to the review.
  13. Click Create.
  14. The metadata enrichment asset displays, but the job might take several minutes to complete. Click the Refresh Refresh icon icon to watch the status change from Queued to In progress to Finished. When the job run is complete, you see the five assets listed.

Checkpoint Check your progress

The following image shows the completed metadata enrichment. Now you can explore the enriched data assets.

Metadata enrichment asset

Task 6: View the results of the metadata enrichment

Watch Video To preview this task, watch the video beginning at 07:30.

After Metadata enrichment run is completed, follow these steps to view the enriched data:

  1. From the Mortgage data - metadata enrichment screen, click the Columns tab.
  2. In the list of Columns, locate the EMAIL_ADDRESS column for the MORTGAGE_APPLICANTS asset.
    1. Click the Overflow menu Overflow menu at the end of the EMAIL_ADDRESS for MORTGAGE_APPLICANTS row, and choose View column details.
    2. In the side panel on the Details tab, you see profiling information such as: Format, Frequency distribution, Statistics.
    3. In the side panel, click the Governance tab. This tab includes the data classes and business terms that were auto-assigned during the metadata enrichment. You might also see suggested business terms and data classes, and manually assign them.
    4. To review the suggested terms and manually assign them:
      1. Click Suggested business terms.
      2. For Address, click Assign.
  3. At the end of the EMAIL_ADDRESS column for the MORTGAGE_APPLICANTS asset row, click the Overflow menu Overflow menu, and choose View data quality. Watson Knowledge Catalog automatically generates a data quality score for each column and data asset by analyzing every value in every record according to pre-built dimensions.
  4. Click the X to close the Data quality window.
  5. For the CITY column for the CREDIT_SCORE asset, click the Overflow menu Overflow menu, and choose Mark as reviewed.
  6. Click the Assets tab.
  7. In the list of Assets, for the MORTGAGE_APPLICANTS asset, click the Overflow menu Overflow menu, and choose View asset details.
    1. In the side panel, click the Governance tab to see business term auto assignment.
    2. To manually assign business terms, click the Edit Edit icon icon.
    3. Search for social. If you don't see any results, then make sure that the drop-down list is set to All terms instead of Suggested terms.
    4. Select Social Security Number.
    5. Click Assign.

Checkpoint Check your progress

The following image shows the reviewed and enriched data assets. The next step is to publish the enriched data to a catalog to share with your organization.

Reviewed enriched data assets

Task 7: Publish data to a catalog

Watch Video To preview this task, watch the video beginning at 08:50.

Now that you have enriched data, you want to publish those data assets to a catalog so data scientists and data analysts can use the enriched data assets. Follow these steps to store the enriched data assets in a catalog for others to have access to the trusted data:

  1. Click the Data governance and privacy project name in the navigation trail.
  2. Click the Assets tab.
  3. Select Data > Data asset.
  4. Select the COMMERICIAL_CLIENT, HOUSE_PRICE, MORTGAGE_APPLICANTS, and MORTGAGE_APPLICATION data assets from the list, and click Publish to catalog.
    1. For the Target catalog, select Mortgage Approval Catalog.
    2. For the MORTGAGE_APPLICANTS asset, click the Edit Edit icon icon, and change the name to MORTGAGE_APPLICANTS_TRUST.
    3. For the Tag, type trusted, and click + (plus sign).
    4. Notice that the data asset and the connection asset will be added to the catalog. Click Publish.
  5. Deselect all checked assets, then select the checkbox next to the CREDIT_SCORE asset from the list, and click Publish to catalog.
    1. For the Target catalog, select Mortgage Approval Catalog.
    2. For the Tag, type confidential, and click + (plus sign).
    3. For the Tag, type trusted, and click + (plus sign).
    4. Click Publish.
  6. From the Cloud Pak for Data navigation menu Navigation menu, choose Catalogs > View all catalogs.
  7. Click Mortgage Approval Catalog.
  8. In the Filter by > Any tag drop down list, select trusted. Verify that the five data assets were added to the catalog.

Checkpoint Check your progress

The following image shows the enriched data assets published to a catalog. Now you have trusted data available through your company's catalog.

Published assets to the catalog

As a Data Steward on the governance team, you learned how to sort and organize the company's data to provide high-quality and protected data assets that data consumers can easily find in a self-service catalog.

Next steps

You are now ready to protect your data by creating data protection rules and masking flows to control access to your data. See the Protect your data tutorial.

Learn more

Parent topic: Data fabric tutorials