0 / 0
Data governance tutorial: Govern virtualized data
Last updated: Nov 27, 2024
Data governance tutorial: Govern virtualized data

This Take this tutorial to govern data that was virtualized after completing the Curate high quality data tutorial, Protect your data tutorial, and Virtualize external data tutorial with the Data integration use case of the data fabric trial. Your goal is to protect the virtual data that contains mortgage applicants and applications and their credit scores for unauthorized access. Certain personal information such as social security number, must be masked so that all Golden Bank employees don't have access to that personal information.

Quick start: If you did not already create the sample project for this tutorial, access the Data governance sample project in the Resource hub.

The story for the tutorial is that Golden Bank has several departments that need access to high-quality customer mortgage data that is stored across three external data sources. As a Data Steward on the governance team, you must enrich the virtualized data and ensure that the virtualized data is protected.

The following animated image provides a quick preview of what you’ll accomplish by the end of this tutorial. You will add virtual data to your project, and then enrich that data with business terms, and see how IBM Knowledge Catalog data protection rules mask data through Cloud Pak for Data as a Service. Click the image to view a larger image.

Animated image

Preview the tutorial

In this tutorial, you will complete these tasks:

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.





Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.



Set up the prerequisites

Complete the prerequisite tutorials

preview tutorial video To preview this task, watch the video beginning at 00:27.

Complete the following tutorials:

Base Premium Standard Unless otherwise noted, this information applies to all editions of IBM Knowledge Catalog.




Task 1: Enable governance of virtualized data

There are two required steps to enabling governance of virtualized data:

  • Enforce data protection rules in Data Virtualization.
  • Set up authorization between IBM Knowledge Catalog and Data Virtualization.

Enforce data protection rules

preview tutorial video To preview this task, watch the video beginning at 01:02.

Follow these steps to enforce data protection rules in Data Virtualization:

  1. From the Navigation Menu Navigation menu, choose Data > Data virtualization.

  2. If you see a notification to Set up a primary catalog to enforce governance, click Go to Governance. If you don't see this message, then from the service menu, click Administration > Service settings, and then click the Governance tab.
    Data Virtualization Service menu

  3. Enable the Enforce data protection rules for virtual objects option, and click Save.

  4. From the service menu, return to Virtualization > Data sources.

Checkpoint icon Check your progress

The following image shows the Governance tab with policy enforcement enabled. Next, you need to set up authorization between IBM Knowledge Catalog and Data Virtualization.

Enforce policies

Set up authorization between IBM Knowledge Catalog and Data Virtualization

preview tutorial video To preview this task, watch the video beginning at 01:40.

Follow these steps to set up authorization between IBM Knowledge Catalog and Data Virtualization:

  1. Visit the Authorizations page in the IBM Cloud console.

  2. Click Create.

  3. For the In which account is the service? field, select This account.

  4. For the Which service or services need access? field, select IBM Knowledge Catalog.

  5. For How do you want to scope the access? to IBM Knowledge Catalog, select All resources.

  6. For the What do you want to give the source access to? field, select Data Virtualization.

  7. For How do you want to scope the access? to Data Virtualization, select All resources.

  8. For Service access, select DataAccess (For Service to Service Authorization Only).

  9. Click Authorize.

Checkpoint icon Check your progress

The following image shows the Authorizations page in IBM Cloud with the authorization between IBM Knowledge Catalog and Data Virtualization. Now you are ready to query governed virtual tables in Data Virtualization.

Authorizations page




Task 2: Run an SQL query on governed virtual tables

preview tutorial video To preview this task, watch the video beginning at 02:20.

With data protection rules in place, virtual tables are governed by those rules. Follow these steps to run an SQL query on a governed virtual table:

  1. From the Data Virtualization service menu, click Run SQL.
    Data Virtualization Service menu

  2. Copy and paste the following SELECT statement for the new query. Replace <your schema> with the schema name that you noted earlier.

    SELECT * FROM <your-schema>.MORTGAGE_APPLICANT WHERE STATE_CODE LIKE 'CA'
    

    Your query looks similar to SELECT * FROM DV_IBMID_663002GN1Q.MORTGAGE_APPLICANT WHERE STATE_CODE LIKE 'CA'
    Select statement

  3. Click Run all.

  4. After the query completes, select the query on the History tab. On the Results tab, you can see that the table is filter to only applicants from the state of California. The data protection rules apply in the Data Virtualization, catalog preview, catalog download, Data Refinery, and Project Asset preview. The rule doesn’t apply to the asset owner. Watch Video Watch the video at 02:47 to see what other users see when they run the SQL query.

Checkpoint icon Check your progress

The following image shows the SQL query results from the perspective of another user. Now you are ready to copy the virtual tables to your project.

SQL query results




Task 3: Copy the virtual data to your project

preview tutorial video To preview this task, watch the video beginning at 03:02.

In the Virtualize external data tutorial, you created virtual tables and virtual join views, and copied them to your Data integration project. If you would like to use that project to complete this tutorial, then skip to Task 3. If you would like to use your Data governance project to complete this tutorial, then follow these steps:

  1. From the service menu, click Virtualization > Virtualized data.
    Data Virtualization Service menu

  2. Select the following tables:

    • MORTGAGE_APPLICATION
    • MORTGAGE_APPLICANT
    • CREDIT_SCORE
    • APPLICANTS_APPLICATIONS_JOINED
    • APPLICANTS_APPLICATIONS_CREDIT_SCORE_JOINED
  3. Click Assign.

  4. For the Project, select Data governance.

  5. Click Assign.

  6. When the virtual objects are successfully assigned, navigate to your project.

  7. In the Data governance project, click the Assets tab. The virtual data tables begin with <your schema>.

  8. Open any of the virtual data tables. For example, click the APPLICANTS_APPLICATIONS_CREDIT_SCORE_JOINED virtual table to view it.

  9. Provide your credentials to access the data asset.

    1. For the Authentication method, select API Key.

    2. Paste the same API key that you created in the Virtualize external data tutorial. Paste API key

    3. Click Connect. The data protection rules apply in the catalog preview, catalog download, Data Refinery, and Project Asset preview. The rule doesn’t apply to the asset owner. Watch Video Watch the video at 04:09 to see what other users see trying to access the virtual data table.

Checkpoint icon Check your progress

The following image shows the virtual table with a masked column in the project from the perspective of a different user. Now you are ready to enrich the data.

Virtual table in project




Task 4: Enrich the virtual data tables

preview tutorial video To preview this task, watch the video beginning at 04:21.

You can enrich data assets with information that helps users to find data faster. Users can use the enrichments to decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews. Follow these steps to enrich the virtual data tables:

  1. Click Data governance in the navigation trail to return to the project.
    Navigation trail

  2. On the Assets tab, click New asset > Enrich data assets with metadata.

  3. For the name, copy and paste the following text:

    Virtual mortgage data - metadata enrichment
    
  4. Click Next to continue.

  5. Click Select data from project.

    1. Select Data asset.

    2. Click the checkbox next to the following assets:

      • <your schema>.MORTGAGE_APPLICATION
      • <your schema>.MORTGAGE_APPLICANT
      • <your schema>.CREDIT_SCORE
      • <your schema>.APPLICANTS_APPLICATIONS_JOINED
      • <your schema>.APPLICANTS_APPLICATIONS_CREDIT_SCORE_JOINED
    3. Click Select.

  6. Click Next to continue to the enrichment objective.

  7. Select all enrichment objectives:

    • Profile data
    • Assign terms
    • Run basic quality analysis
  8. For Categories, click Select categories.

    1. Select only [uncategorized] and Banking.

    2. Click Select.

  9. For the Sampling, select Basic.

  10. Click Next to continue to the schedule.

  11. Click Next to continue to the review.

  12. Click Create.

  13. The metadata enrichment asset displays, but the job might take several minutes to complete. Click the Refresh icon Refresh to watch the status change from Queued to In progress to Finished. When the job run is complete, you see the five assets listed.

Checkpoint icon Check your progress

The following image shows the completed metadata enrichment. Now you can explore the enriched data assets.

Enriched data




Task 5: View the results of the metadata enrichment

preview tutorial video To preview this task, watch the video beginning at 05:48.

After Metadata enrichment run is completed, follow these steps to view the enriched data:

  1. From the Virtual mortgage data - metadata enrichment screen, click the Columns tab.

  2. Search for mortgage_applicant.

  3. In the list of Columns, locate the EMAIL_ADDRESS column for your-schema.MORTGAGE_APPLICANT asset.

    1. Click the Overflow menu Overflow menu at the end of the EMAIL_ADDRESS for your your_schema.MORTGAGE_APPLICANT row, and choose View column details.

    2. In the side panel on the Details tab, you see profiling information such as: Format, Frequency distribution, Statistics.

    3. In the side panel, click the Governance tab. This tab includes the data classes and business terms that were auto-assigned during the metadata enrichment. You might also see suggested business terms and data classes, and manually assign them.

    4. Review any suggested business terms or data classes and manually assign them. For example, you may see Address as a suggested business term.

      1. Click Suggested business terms.

      2. For Address, click Assign.

      3. Click Suggested data classes.

      4. For Text, click Assign.

  4. At the end of the EMAIL_ADDRESS column for your your_schema.MORTGAGE_APPLICANT asset row, click the Overflow menu Overflow menu, and choose View data quality details.

    1. View the data quality score. IBM Knowledge Catalog automatically generates a data quality score for each column and data asset by analyzing every value in every record according to pre-built dimensions.

    2. Click the X to close the Data quality window.

  5. Search for credit_score.

  6. For the CITY column for your_schema.CREDIT_SCORE asset, click the Overflow menu Overflow menu, and choose Mark as reviewed.

  7. Click the Assets tab.

  8. In the list of Assets, for your your_schema.MORTGAGE_APPLICANT asset, click the Overflow menu Overflow menu, and choose View asset details.

    1. In the side panel, click the Governance tab to see any business term that were auto-assigned.

    2. Click the Add icon Add (or you might see the Edit icon Edit) to manually assign business terms.

    3. Search for social. If you don't see any results, then make sure that the drop-down list is set to All terms instead of Suggested terms.

    4. Select Social Security Number.

    5. Click Assign.

Checkpoint icon Check your progress

The following image shows the reviewed and enriched data assets. The next step is to publish the enriched data to a catalog to share with your organization.

Reviewed enriched data assets




Task 6: Publish virtual tables to a catalog

preview tutorial video To preview this task, watch the video beginning at 7:18.

Now that the virtualized data is enriched with business terms, follow these steps to publish the virtual tables it to a catalog:

  1. Click Data governance in the navigation trail to return to the project.
    Navigation trail

  2. Click the Assets tab.

  3. Navigate to Data > Data assets.

  4. Click the checkbox next to the following assets:

    • <your schema>.MORTGAGE_APPLICATION
    • <your schema>.MORTGAGE_APPLICANT
    • <your schema>.CREDIT_SCORE
    • <your schema>.APPLICANTS_APPLICATIONS_JOINED
    • <your schema>.APPLICANTS_APPLICATIONS_CREDIT_SCORE_JOINED
  5. Click Publish to catalog.

    1. Select the Mortgage Approval Catalog (or your catalog name) from the list, and click Next.

    2. Select the option to Go to the catalog after publishing it, and click Next.

    3. Review the assets, and click Publish.

  6. In the Mortgage Approval Catalog, search for <your-schema>.

  7. Open one of the virtual tables. If prompted, provide your credentials:

    1. For the Authentication method, select API Key.

    2. Paste the same API key that you created in the Virtualize external data tutorial.

  8. Click Asset tab to view the data. The data protection rules apply in the catalog preview, catalog download, Data Refinery, and Project Asset preview. The rule doesn’t apply to the asset owner. Watch Video Watch the video at 08:17 to see what other users see trying to access the virtual data table in the catalog.

Checkpoint icon Check your progress

The following image shows the data preview of the virtual table in the catalog from the perspective of the user.

Catalog preview



As data engineers and data stewards at Golden Bank, you enriched the virtualized data to ensure that the virtualized data is protected.

Cleanup (Optional)

If you would like to retake the tutorials in the Data governance use case, refer to the Cleanup section in each of the prerequisite tutorials:


Next steps

Learn more

Parent topic: Use case tutorials

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more