You create a governance framework to govern and enrich your data by implementing governance artifacts in collaborative workspaces called categories. Some types of governance artifacts act as metadata to enrich data assets. Other types of governance artifacts control access to data assets or to other artifacts.
Required service Watson Knowledge Catalog
You use governance artifacts for these purposes:
- Enrichment: Artifacts can add knowledge and meaning to assets.
- Control access: Artifacts can control who sees what data or which artifacts.
- Identification: Artifacts can act as criteria to identify assets or data for other artifacts.
- Quality: Artifacts can be used to analyze data quality dimensions.
You can use categories and governance artifacts from any or all of these sources:
- Predefined governance artifacts that are provided with Watson Knowledge Catalog
- Industry-specific Knowledge Accelerators
- Custom governance artifacts that your governance team creates
The following table briefly describes categories and each type of governance artifact and indicates whether any of the items are predefined or available in Knowledge Accelerators.
|Governance item||Description||Predefined items?||Provided by Knowledge Accelerators?|
|Categories||Categories organize governance artifacts in a hierarchical structure similar to folders.
You can use category roles to define ownership of artifacts, control their authoring, and restrict their visibility.
Examples: Business Performance Indicators, Business Scopes
|The [uncategorized] category, which contains the predefined data classes and classifications. The Locations category, which contains the predefined reference data sets. Limited: The Knowledge Accelerator Sample Personal Data category which contains predefined business terms.||Each Knowledge Accelerator provides many categories.|
|Business terms||Business terms implement a common enterprise vocabulary to describe the meaning of data.
You create business terms to ensure clarity and compatibility among departments, projects, or products. Business terms are the core of your governance framework and typically form the bulk of your governance artifacts. You can manually assign business terms to data columns, tables, or files or automatically assign them during metadata enrichment. You can use business terms in governance rules and enforceable rules to identify the affected data.
Examples: Customer lifetime value, Work phone number
|Limited: Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a Watson Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022. For more information see Predefined business terms.||Each Knowledge Accelerator provides many business terms.|
|Data classes||Data classes classify data based on the structure, format, and range of values of the data.
Data classes are automatically assigned to matching data columns during profiling and metadata enrichment. You can create data classes by defining matching criteria with an expression or a reference data set. You can create relationships between data classes and business terms to link data format with business meaning. Related business terms are automatically assigned to data along with their related data classes. How well columns conform to their data class criteria contributes to data quality analysis. Before you have a robust set of business terms, you can use data classes in enforceable rules to identify the affected data.
Examples: Phone number, Email address
|Over 150 predefined data classes in the [uncategorized] category.||Each Knowledge Accelerator provides data classes.|
|Reference data sets||Reference data sets define standard values for specific types of data to classify data and measure consistency.
Reference data sets act as lookup tables that map codes and values. You can include a reference data set in the definition of a data class as part of the data matching criteria. Some reference data sets are standardized by organizations, such as the International Organization for Standardization (ISO). Reference data can be hierarchical or mapped across related sets.
Example: Country codes
|The Physical locations and Sovereign locations predefined reference data sets in the Locations category.||Each Knowledge Accelerator provides many reference data sets.|
|Classifications||Classifications describe specific characteristics of the meaning of data.
Predefined classifications describe the sensitivity of the data. You can create classifications to describe other characteristics of data or other governance items. For example, Knowledge Accelerators use classifications to classify categories and business terms. You can use classifications to construct governance policies and rules. Typically, you relate multiple business terms to each classification and then data is indirectly classified through its assigned business terms. You can also manually assign a classification to a data asset.
Example: Sensitive Personal Information
|Several predefined classifications in the [uncategorized] category.||Each Knowledge Accelerator provides classifications.|
|Policies||Policies describe how to manage and protect data assets.
You create policies by combining rules and subpolicies. You can include data protection rules and data location rules in policies to control and manage data. However, policies do not affect the enforcement of data protection rules and data location rules. You can include governance rules in policies to document standards and procedures.
Example: Data sharing agreement
|Governance rules||Governance rules describe how to apply a policy.
Governance rules provide a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives. Governance rules are not enforced by Watson Knowledge Catalog. However, you can relate governance rules to enforceable rules, such as data protection rules and data quality rules.
Example: Customer name must not be null.
|Data protection rules||Data protection rules define how to control access to data based on users and asset properties and assigned governance artifacts. Data protection rules define who can see what data.
Within data protection rules, you can include classifications, data classes, business terms, or tags to identify the data to control. You specify to deny access to data or to mask sensitive data values. Data protection rules are automatically enforced in governed catalogs only. Data protection rules are not organized or controlled by categories.
Example: Mask columns that are assigned the Passport Identifier business term.
|Data location rules (experimental)||Data location rules control access to data based on their physical and sovereign locations, on users and asset properties, and assigned governance artifacts.
Data location rules control who can see what data. Within data location rules, you can specify the direction the data is leaving from or coming to a physical or sovereign location. You can also include classifications, data classes, business terms, or tags to identify the data to control. You specify to allow access to data or to mask sensitive data values. Data location rules are automatically enforced in all governed catalogs. Data location rules are not organized or controlled by categories.
Example: Mask columns that are assigned the Personal Identifiable Information business term in a data asset leaving Germany and accessed in other countries.
|Predefined Physical location and Sovereign location||None|
Governance artifacts are scoped to Watson Knowledge Catalog catalogs in the same IBM Cloud account.
You must have the specific Cloud Pak for Data service permissions to work with governance artifacts. See Required permissions.
Some Watson Knowledge Catalog plans have limits on the number of governance artifacts of a specific type that you can create.
Watch this short video to learn about the policies features.
This video provides a visual method as an alternative to following the written steps in this documentation.
Time Transcript 00:00 This video provides an overview of the data policies features available in IBM Watson Knowledge Catalog. 00:07 Users who have permission to implement and monitor data policies will likely be different than the catalog admins. 00:14 For example, a large company might have multiple catalogs, each with different admins, but managing data policies is the same across all catalogs that have policies enabled. 00:27 Policy management involves a set of policies and rules that restrict users from accessing data. 00:34 To use data policies in a catalog, you first need to set the "Enforce data policies" option for the catalog. 00:42 You can do that when you create the catalog. 00:45 And in the catalog settings, you can see if policies are enforced. 00:51 Say you want to create a policy that will restrict access to competitors' data. 00:56 You do that from the "Policies" section. 00:59 This deny access policy looks interesting. 01:05 And it has one rule associated with it. 01:10 But that's not an exact fit. 01:13 Here's another policy to protect PII, which has one rule associated with it: to deny access based on a specific tag. 01:24 While those existing policies and rules are close, it seems appropriate to create a new policy and rule. 01:31 Let's start with creating the rule. 01:34 There are two types of rules: data protection rules and governance rules. 01:39 In this case, you want to create a data protection rule. 01:44 Rules need a name, type, and business definition. 01:49 The business definition is important, because it describes the rule in plain language and helps users find the rule through search. 01:57 Then, identify the criteria, made up of the business terms, which can be mapped to users or data. 02:04 The first condition is to match the asset tag, called "competitive". 02:10 And the second condition is to match any user that contains the specified users. 02:22 Lastly, set the action to deny access to the assets and create the rule. 02:31 So, now you have the data protection rule defined and you're ready to create the policy. 02:37 Note that data protection rules do not need to be included in a policy to take affect. 02:42 But in this case, you'll create a policy and assign the rule to that policy. 02:48 Policies require a name and must be associated with a category. 02:55 The description is optional, but it's a best practice to include it. 03:00 Then save the policy as a draft. 03:05 The "Primary category" is the category that owns the policy. 03:09 You can add any number of "Secondary categories" that would provide additional access to the policy. 03:16 This policy won't contain any subpolicies or governance rules, but will contain a data protection rule. 03:24 You will need to publish the policy first, before you can add a data protection rule. 03:33 When the policy is published, you'll have the option to add the data protection rule, called "Restrict to Sales Executives". 03:46 Now, go back to the catalog and open an asset and tag it with "competitive". 04:03 With that rule and policy in place, any user other than those specified in the rule will be blocked from seeing a data asset tagged with "competitive". 04:15 Find more videos in the Cloud Pak for Data as a Service documentation.
- Planning to govern data
- Find and view governance artifacts
- Governance artifact properties and relationships
- Managing artifacts
- Workflow process for governance artifacts
- Knowledge Accelerators
- Watson Knowledge Catalog plans
- Watson Knowledge Catalog APIs
Parent topic: Data governance