0 / 0
Data governance and privacy use case
Data governance and privacy use case

Data governance and privacy use case

Many enterprises struggle to balance the benefits of providing access to data with the need to protect sensitive data. Cloud Pak for Data as a Service provides the methods that your enterprise needs to automate data governance and privacy so you can ensure that data is both accessible and protected.

Watch this video to see the data fabric use case for implementing a Data governance and privacy solution in Cloud Pak for Data.

This video provides a visual method as an alternative to following the written steps in this documentation.

Challenges

Many enterprises face the following data governance and privacy challenges:

Providing data privacy at scale
Organizations must comply with data privacy regulations for data in data sources across multiple cloud platforms and on-premises.

Accessing data high-quality data
Organizations must provide access to high-quality enterprise data across mutliple teams.

Providing self-service data consumption
Data consumers, such as data scientists, struggle to find and use the data that they need.

You can solve these challenges by implementing a data fabric with Cloud Pak for Data as a Service.

Example: Golden Bank's challenges

Follow the story of Golden Bank as the governance team implements data governance. Golden Bank has a large amount of customer and mortgage data that includes sensitive data. The bank wants to ensure the quality of the data, mask the sensitive data, and make it available for use across several departments.

Process

To implement data governance and privacy, your organization can follow this process:

  1. Set up a governance framework
  2. Create rules to protect your data
  3. Curate data to share in catalogs
  4. Find and use data

The Watson Knowledge Catalog service in Cloud Pak for Data as a Service provides all of the tools and processes that your organization needs to implement a data governance and privacy solution.

Image showing the flow of assets in the data governance and privacy use case

1. Set up a governance framework

To meet all three of the challenges, your team needs to set up a framework of governance artifacts that act as metadata to classify and describe the data:

  • Before you can automate data privacy, your team needs to ensure that the data to control is accurately identified.
  • Before you can analyze data quality, you need to identify the format of the data.
  • To make data easy to find, your team needs to ensure that the content of the data is accurately described.

In this first step of the process, your governance team can build on the foundation of the predefined governance artifacts and create custom governance artifacts that are specific to your organization. You can create artifacts to describe the format, business meaning, sensitivity, range of values, and governance policies of the data.

What you can use What you can do Best to use when
Categories Use the predefined category to store your governance artifacts.

Create categories to organize governance artifacts in a hierarchical structure similar to folders.

Add collaborators with roles that define their permissions on the artifacts in the category.
You need more than the predefined category.

You want fine-grained control of who can own, author, and view governance artifacts.
Workflows Use the default workflow configuration that does not restrict who creates governance artifacts or require reviews.

Configure workflows for governance artifacts and designate who can create which types of governance artifacts in which categories.
You want to control who creates governance artifacts.

You want draft governance artifacts to be reviewed before they are published.
Governance artifacts Use the predefined data classes and classifications.

Create governance artifacts that act as metadata to enrich, define, and control data assets.
You want to add knowledge and meaning to assets to help people understand the data.

You want to improve data quality analysis.
Knowledge Accelerators Import a set of predefined governance artifacts to improve data classification, regulatory compliance, self-service analytics, and other governance operations. You need a standard vocabulary to describe business issues, business performance, industry standards, and regulations.

You want to save time by importing pre-created governance artifacts.


Example: Golden Bank's governance framework

The governance team leader at Golden Bank starts by creating a category, Banking, to hold the governance artifacts that the team plans to create. The team leader adds the rest of the governance team members as collaborators to the Banking category with the Editor role so that they have permission to create governance artifacts. Then, the team leader configures workflows so that a different team member is responsible for creating each type of artifact. All workflows require an approval step by the team leader.

One governance team member imports a set of business terms from a spreadsheet. Some of the business terms differentiate between personal and commercial clients. Another team member creates a reference data set, "Diamond-level client names", that contains a list of the top commercial clients. A third team member creates a custom data class, "Diamond-level clients" to identify the top commercial clients, based on the reference data set.

2. Create rules to protect your data

In the next step of the process, your team sets up rules to ensure compliance with data privacy regulations by controlling who can see what data. Your team creates data protection rules that protect data across the platform. Your team can use these data protection rules to mask sensitive data based on the content, format, or meaning of the data, or the identity of the users who access the data.

What you can use What you can do Best to use when
Data protection rules Protect sensitive information from unauthorized access by denying access or masking data values in data assets.

Dynamically and consistently mask data at a user-defined granular level.
You need to automatically enforce data privacy across the platform.

You want to retain availability and utility of data while you also comply with privacy regulations.
Masking flows Use advanced format-preserving data masking capabilities when you extract copies or subsets of production data. You need anonymized training data and test sets that retain data integrity.
Policies and governance rules Describe and document your organization’s guidelines, regulations, standards, or procedures for data security.

Describe the required behavior or actions to implement the governance policy.
You want the people who use the data understand the data governance policies.


Example: Golden Bank's data protection rules

To create a predictive model for mortgage approvals, Golden Bank's data scientists need access to data sets that include sensitive data. For example, the data scientists want to access the table with data about mortgage applicants, which includes a column with social security numbers.

A governance team member creates a data protection rule that masks social security numbers. If the assigned data class of a column in a data asset is "US Social Security Number", the values in that column are replaced with 10 Xs.

A governance team member creates a policy that includes the data protection rule. The policy describes the business reasons for implementing the rule. Later, when users, such as data scientists, see the masked icon on a data column, they can view the data protection rule, and then view the associated policy to understand why the data is masked.

3. Curate data to share in catalogs

Data stewards curate high-quality data assets in projects and publish them to catalogs where the people who need the data can find them. Data stewards enrich the data assets by assigning governance artifacts as metadata that describes the data and informs the semantic search for data.

What you can use What you can do Best to use when
Metadata import Automatically import technical metadata for the data that is associated with a connection to create data assets. You need to create many data assets from a data source.

You need to refresh the data assets that you previously imported.
Metadata enrichment Profile multiple data assets in a single run to automatically assign data classes and identify data types and formats of columns.

Automatically assign business terms to assets and generate term suggestions based on data classification.

Rerun the import and the enrichment jobs at intervals to discover and evaluate changes to data assets.
You need to curate and publish many data assets that you imported.
Data quality analysis Run quality analysis on multiple data sets in a single run to scan for common dimensions of data quality like missing values or data class violations.
Continuously track changes to content and structure of data, and recurringly analyze changed data.
You need to know whether the quality of your data might affect the accuracy of your data analysis or models.

Your users need to identify which data sets to remediate.
Catalogs Publish curated assets to share among the collaborators in your organization. You need a central repository to store data assets that displays the assocated metadata, relationships, and history of the assets.


Example: Golden Bank's data curation

The data stewards on the governance team start importing metadata to create data assets in a project. After metadata import, Golden Bank has two data assets that represent tables with a column that is named "ID". After metadata enrichment, those columns are clearly differentiated by their assigned metadata:

  • One column is assigned the business terms "Commercial client" and "Company identifier", and the data class "Diamond-level clients".
  • The other column is assigned the business terms "Personal identifier" and "Private individual" and the data class "US Social Security Number".

The data stewards run data quality analysis on the data assets to make sure that the overall data quality score exceeds the Golden Bank threshold of 95%.

The governance team leader creates a catalog, "Mortgage Approval Catalog" and adds the data stewards and data scientists as catalog collaborators. The data stewards publish the data assets that they created in the project into the catalog.

4. Find and use data

The catalog helps your teams understand your data and makes the right data available for the right use. Data scientists and other types of users can help themselves to the data that they need while they remain compliant with corporate access and data protection policies. They can add data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.

What you can use What you can do Best to use when
Catalogs Organize your assets to share among the collaborators in your organization.

Take advantage of AI-powered semantic search and recommendations to help users find what they need.
Your users need to easily understand, collaborate, enrich, and access the high-quality data.

You want to increase visibility of data and collaboration between business users.

You need users to view, access, manipulate, and analyze data without understanding its physical format or location, and without having to move or copy it.

You want users to enhance assets by rating and reviewing assets.
Global search Search for assets across all the projects, catalogs, and deployment spaces to which you have access.

Search for governance artifacts across the categories to which you have access.
You need to find data or another type of asset, or a governance artifact.
Data Refinery Cleanse data to fix or remove data that is incorrect, incomplete, improperly formatted, or duplicated.

Shape data to customize it by filtering, sorting, combining, or removing columns.
You need to improve the quality or usefulness of data.


Example: Golden Bank's catalog

The data scientists find the data assets that they need in the catalog and copy those assets to a project. In their project, the data scientists can refine the data to prepare it for training a model.

Tutorials for data privacy and governance

Tutorial Description Expertise for tutorial
Trust your data Create trusted data assets by enriching your data and running data quality analysis. Run the Metadata import and Metadata enrichment tools.
Protect your data Control access to data across Cloud Pak for Data as a Service. Create data protection rules.
Know your data Evaluate, share, shape, and analyze data. Explore a catalog and run the Data Refinery tool.


Learn more about data privacy and governance

Parent topic: Data fabric solution overview