Designing data location rules (Watson Knowledge Catalog)
Designing a data location rule includes the direction that the data is moving from one location to another, the criteria for enforcing the rule, and the corresponding enforcement action.
Experimental This is an experimental release and is not yet supported for use in production environments.
The direction of the data for which the rule is enforced can be incoming, outgoing, or both. When you specify the data direction of the rule as incoming, the data that is entering the location is restricted. When you specify the data direction of the rule as outgoing, the data that is leaving its location is restricted. The criteria can include which users are affected, the classification of the data asset, or other metadata assigned to the data asset. The enforcement action can be to deny or allow access to all data within the asset, or to mask some of the data and allow access to the rest of the data within the asset.
You must have these user permissions:
- To create data location rules, you must have the Manage data protection rules permission.
- To include governance artifacts in your rules, you must have the Access governance artifacts permission and you must be a collaborator in the categories of the governance artifacts that you want to use in the rule.
If you are missing permissions, ask your platform administrator to give them to you.
Settings for data location rules
The settings for data location rules affect all the data location rules in the platform. To configure settings for data location rules, call the
https://api.dataplatform.cloud.ibm.com/v3/enforcement/settings API. See Prerequisites to creating a data protection rule.
- Enabling data location rules
- By default, data location rules are disabled. Change the
- Data access convention
- You can set the default data access convention to one of these options:
AEAD: Default. Follows the “Allow Everything Author Deny” convention. Allows access to data unless a rule denies it. You write rules that deny access to data.
DEAA: Follows the “Deny everything author allow" convention. Denies access to data unless a rule allows it. You write rules that allow access to data.
Properties of data location rules
The properties and behavior of data location rules differ significantly from other governance artifacts.
|Property or behavior||Supports?||Explanation|
|Must have unique names?||Yes||Each data protection rule must have a unique name.|
|Description?||Yes||Describe what the rule does in natural language so that it is easy to understand. Include standard words and terms to make it easy to search for this rule.|
|Add relationships to other rules?||No||Data location rules don't have relationships with each other.|
|Add relationships to other governance artifacts?||Yes||You can add governance artifacts in the definitions of data location rules. The data location rule then appears on the Related content tab of the governance artifacts that are included in its definition. You can also add data location rules to policies. However, data location rules are enforced regardless of whether they are included in any published policies.|
|Add relationship to asset?||Yes||See Asset relationships in catalogs.|
|Add custom attributes?||No||Data location rules don't support custom attributes.|
|Add custom relationships?||No||Data location rules don't support custom relationships.|
|Organize in categories?||No||Data location rules are not controlled by categories. They are enforced across all governed catalogs on the platform and visible to all users.|
|Import from a file?||No||You must create each data location rule individually.|
|Export to a file?||No||You can't export a data location rule.|
|Managed by workflows?||No||Data location rules are published and active after creation.|
|Specify start and end dates?||No||Data location rules are active after creation and until they are deleted.|
|Assign a Steward?||No||Data location rules don't have stewards.|
|Add tags?||Yes||Although you can't add tags as properties to data location rules, you can include tags in the definitions of data location rules.|
|Assign to an asset?||Yes||Although you can't manually assign data location rules to assets, rules are enforced for assets when the assets match the criteria of the rule.|
|Assign to a column in a data asset?||Yes||Although you can't manually assign a data location rule to a column in an asset, data location rules can mask the values of a column when the column matches the criteria and action block directives of the rule.|
|Automated assignment during profiling or enrichment?||No||Data location rules are enforced when a user attempts to access a data asset.|
|Predefined artifacts in the [uncategorized] category?||No||You must create all data location rules.|
Data location rules are composed of three components:
You can specify a direction for which to enforce the rule or accept the default of both directions. The data direction determines whether restriction of the data is required when it is entering or leaving its physical or sovereign location. When you specify the data direction is incoming, the access to the data can be restricted or allowed based on the location the data is going to. When you specify the data direction as outgoing, access to the data can be restricted or allowed based on the location the data is coming from.
For example, suppose users in the United States need to access data that is physically located in Germany. In this example, if you select the incoming direction, then you define a rule to control data that is going to the United States. If you select the outgoing direction, then you define a rule to control data that is coming from the Germany.
The criteria identifies conditions for enforcing the data location rule. A criteria consists of one or more conditions. A condition consists of one or more predicates that describe properties of data assets or identify users and that are combined by operators.
You select the type of predicate, either the contains any or the does not contain any operator, and the specific value of the predicate. You can then join predicates and conditions with the AND or OR Boolean operators to create nested logical structures with precise criteria.
|Target sovereignty||The sovereign location that the data is going to. For example, data originating in Japan is going to Germany. Germany is the target sovereignty.||Click Add sovereignties to select one or more target locations.|
|Source sovereignty||The sovereign location that the data is coming from. For example, data originating in Japan is going to Germany. Japan is the source sovereignty.||Click Add sovereignties to select one or more source locations.|
|Asset owner||The email address of the user who owns the asset in the catalog, for example, [email protected].||Search for and then select one or more email addresses.|
|Business term||A business term that is assigned to the asset or to a column.||Search for and then select a published business term.|
|Data class||The data class that is assigned to a column that classifies the content of the data, for example, customer number, date of birth, or city.||Search for and then select a published data class.|
|Tag||A tag that is assigned to the asset or to a column.||Enter one or more tags, separated by commas.|
|User name||The name or email address of an existing catalog collaborator, for example, [email protected].||Search for and then select one or more email addresses.|
|User group||The name of a user group that is a catalog collaborator.||Search for and then select one or more user groups.|
|Classification||The classification artifact that is assigned to the asset.||Search for and then select a published classification.|
For example, a predicate that is designed to obfuscate United Kingdom data assets that are in the PII and Address columns before arriving in Japan might look like this when the data direction is incoming:
If source sovereignty contains any United Kingdom And If the target sovereignty contains any Japan Then Obfuscate data in columns containing Column name PII Address
The action of the data location rule defines the effect of enforcing the rule. The action prevents affected catalog members from accessing or viewing the original data, as specified by the conditions. If the source and target sovereign locations are the same when the rule is evaluated, the rule is not enforced and data access is allowed.
You choose from three types of actions.
|Deny access to data||All data values in all columns of the data asset||Affected users cannot preview any data values, view the asset profile, or use the asset data.|
|Allow access to data||All data values in all columns of the data asset||Affected users can preview any data values, view the asset profile, use the data, or perform actions on the asset. Users can also download the assets or add them to a project.|
|Mask||The values in column that match the masking criteria||Affected users can view all values in unmasked columns, view generated values in masked columns, can use the data, and can perform actions on the asset, according to their catalog roles. Masking can extend to projects. See Masking in projects. Choose from three types of masking methods based on how much you want to disguise the original data.|
To mask data, the data must conform to these requirements:
- The data is structured. The data must be in relational tables or CSV, Avro, partitioned data, or Parquet files.
- The column headers contain only alphanumeric characters (a-z, A-Z, 0-9). The column headers can't contain unsupported characters, such as, multi-byte characters or special characters.
When you choose the masking action, you must specify the masking criteria and the masking method.
The masking criteria identifies the columns to mask. You select the type of column property, and specify one or more specific values of the property, which are logically combined with the OR operator.
|Type of column property||Description||Specific values|
|Business term||A business term that is assigned to the column.||Search for and then select one or more published business terms.|
|Data class||The data class that is assigned to the column.||Search for and then select one or more published data classes.|
|Tag||A tag that is assigned to a column in the asset.||Enter one or more tags, separated by commas.|
|Column name||The name of a column.||Enter one or more column names, separated by commas.|
For example, suppose you choose the column property of Data class and the specific values of California State Driver's License and Nevada State Driver's License. Values are then masked in columns that are assigned either the California State Driver's License or the Nevada State Driver's License data class.
The main differences between the masking methods are how much of the original characteristics of the data remain. The more original characteristics of the data that is retained, the more useful, but the less secure, the masked data becomes. When you choose a masking method, consider these factors:
Data integrity: Whether to repeat the same masked value for a repeated original value to maintain referential integrity between tables.
Data format: Whether to retain the format of the original data. Preserving the format means that letters are replaced by letters with the same case, digits are replaced by digits, and the number of characters is the same.
The following table describes how each masking method affects these characteristics.
|Method||Description||Preserves integrity?||Preserves data format?|
|Redact||Replace values with ten X characters. The most secure method.||No||No|
|Substitute||Replace values with randomly generated values that preserve referential integrity.||Yes||No|
|Obfuscate||Replace values with values that preserve referential integrity and the original data format. The least secure method.||Yes||Yes|
For virtual data, the masking behavior is slightly different, based on the data field definition. See Masking virtual data.
The redact method replaces each data value with a string of exactly 10 letters of X. With redacted data, the format of the data and data integrity are not preserved. Redact is the most secure masking method, but results in the least useful masked data.
For example, the phone number 510-555-1234 is replaced with XXXXXXXXXX. All other phone numbers are replaced with the same value.
You can specify advanced redaction options for criteria that are based on data classes with advanced data masking. However, advanced data masking is not enforced automatically. You must apply it to selected data assets in a project and then publish the masked assets to a catalog.
The substitute method replaces data with values that don't match the original format. However, it does preserve referential integrity for repeated values for all assets in the catalog. The substituted values are meaningless and the original format of the values can't be determined. Substitute provides security and data usefulness in between the Redact and Obfuscate methods.
For example, the phone number 510-555-1234 is always replaced with
The obfuscate method replaces the data values with similarly formatted values that match the original format and preserves referential integrity for repeated values. Because the obfuscated values are similarly formatted, they can be valid values. Obfuscate is the least secure masking method, but results in the most useful masked data.
For example, the phone number 510-555-1234 is always replaced with 415-987-6543.
However, the obfuscate method is limited to data values in columns that have assigned data classes with the following types of information:
- Personal information, for example, basic attributes of an individual, such as honorific or name suffix.
- Contact details, for example, email addresses, phone numbers, state, postal addresses, latitude, or longitude.
- Financial accounts, for example, credit cards, banking, or other financial account numbers.
- Government identities, for example, personal identification numbers issued by governments, such as SSN (US social security numbers) and CCN (credit card numbers).
- Personal demographic information, for example, religion, ethnicity, marital status, hobbies, or employee status.
- Connectivity data, for example, IP address, or mac address.
If you create a rule to obfuscate data and the rule is enforced on data that is not assigned a data class that supports obfuscation, the substitute method is used instead.
You can specify advanced obfuscation options for masking criteria that are based on data classes with advanced data masking. However, advanced data masking is not enforced automatically. You must apply it to selected data assets in a project and then publish the masked assets to a catalog.
Parent topic: Data location rules