Designing data location rules (IBM Knowledge Catalog)

Designing a data location rule includes the direction that the data is moving from one location to another, the criteria for enforcing the rule, and the corresponding enforcement action.

Experimental This is an experimental release and is not yet supported for use in production environments.

The direction of the data for which the rule is enforced can be incoming, outgoing, or both. When you specify the data direction of the rule as incoming, the data that is entering the location is restricted. When you specify the data direction of the rule as outgoing, the data that is leaving its location is restricted. The criteria can include which users are affected, the classification of the data asset, or other metadata assigned to the data asset. The enforcement action can be to deny or allow access to all data within the asset, or to mask some of the data and allow access to the rest of the data within the asset.

Required permissions

You must have these user permissions:

To create data location rules, you must have the Manage data protection rules permission.
To include governance artifacts in your rules, you must have the Access governance artifacts permission and you must be a collaborator in the categories of the governance artifacts that you want to use in the rule.

If you are missing permissions, ask your platform administrator to give them to you.

Settings for data location rules

The settings for data location rules affect all the data location rules in the platform. To configure settings for data location rules, call the https://api.dataplatform.cloud.ibm.com/v3/enforcement/settings API. See Prerequisites to creating a data protection rule.

Enabling data location rules: By default, data location rules are disabled. Change the enable_data_location_rules setting to true.
Data access convention: You can set the default data access convention to one of these options:

AEAD: Default. Follows the “Allow Everything Author Deny” convention. Allows access to data unless a rule denies it. You write rules that deny access to data.
DEAA: Follows the “Deny everything author allow" convention. Denies access to data unless a rule allows it. You write rules that allow access to data.

Properties of data location rules

The properties and behavior of data location rules differ significantly from other governance artifacts.

Property or behavior	Supports?	Explanation
Must have unique names?	Yes	Each data protection rule must have a unique name.
Description?	Yes	Describe what the rule does in natural language so that it is easy to understand. Include standard words and terms to make it easy to search for this rule.
Add relationships to other rules?	No	Data location rules don't have relationships with each other.
Add relationships to other governance artifacts?	Yes	You can add governance artifacts in the definitions of data location rules. The data location rule then appears on the Related content tab of the governance artifacts that are included in its definition. You can also add data location rules to policies. However, data location rules are enforced regardless of whether they are included in any published policies.
Add relationship to asset?	Yes	See Asset relationships in catalogs.
Add custom attributes?	No	Data location rules don't support custom attributes.
Add custom relationships?	No	Data location rules don't support custom relationships.
Organize in categories?	No	Data location rules are not controlled by categories. They are enforced across all governed catalogs on the platform and visible to all users.
Import from a file?	No	You must create each data location rule individually.
Export to a file?	No	You can't export a data location rule.
Managed by workflows?	No	Data location rules are published and active after creation.
Specify start and end dates?	No	Data location rules are active after creation and until they are deleted.
Assign a Steward?	No	Data location rules don't have stewards.
Add tags?	Yes	Although you can't add tags as properties to data location rules, you can include tags in the definitions of data location rules.
Assign to an asset?	Yes	Although you can't manually assign data location rules to assets, rules are enforced for assets when the assets match the criteria of the rule.
Assign to a column in a data asset?	Yes	Although you can't manually assign a data location rule to a column in an asset, data location rules can mask the values of a column when the column matches the criteria and action block directives of the rule.
Automated assignment during profiling or enrichment?	No	Data location rules are enforced when a user attempts to access a data asset.
Predefined artifacts in the [uncategorized] category?	No	You must create all data location rules.

Data location rules are composed of three components:

Data direction

You can specify a direction for which to enforce the rule or accept the default of both directions. The data direction determines whether restriction of the data is required when it is entering or leaving its physical or sovereign location. When you specify the data direction is incoming, the access to the data can be restricted or allowed based on the location the data is going to. When you specify the data direction as outgoing, access to the data can be restricted or allowed based on the location the data is coming from.

For example, suppose users in the United States need to access data that is physically located in Germany. In this example, if you select the incoming direction, then you define a rule to control data that is going to the United States. If you select the outgoing direction, then you define a rule to control data that is coming from the Germany.

Criteria

The criteria identifies conditions for enforcing the data location rule. A criteria consists of one or more conditions. A condition consists of one or more predicates that describe properties of data assets or identify users and that are combined by operators.

You select the type of predicate, either the contains any or the does not contain any operator, and the specific value of the predicate. You can then join predicates and conditions with the AND or OR Boolean operators to create nested logical structures with precise criteria.

Predicate	Description	Specific values
Target sovereignty	The sovereign location that the data is going to. For example, data originating in Japan is going to Germany. Germany is the target sovereignty.	Click Add sovereignties to select one or more target locations.
Source sovereignty	The sovereign location that the data is coming from. For example, data originating in Japan is going to Germany. Japan is the source sovereignty.	Click Add sovereignties to select one or more source locations.
Asset owner	The email address of the user who owns the asset in the catalog, for example, [email protected].	Search for and then select one or more email addresses.
Business term	A business term that is assigned to the asset or to a column.	Search for and then select a published business term.
Data class	The data class that is assigned to a column that classifies the content of the data, for example, customer number, date of birth, or city.	Search for and then select a published data class.
Tag	A tag that is assigned to the asset or to a column.	Enter one or more tags, separated by commas.
User name	The name or email address of an existing catalog collaborator, for example, [email protected].	Search for and then select one or more email addresses.
User group	The name of a user group that is a catalog collaborator.	Search for and then select one or more user groups.
Classification	The classification artifact that is assigned to the asset.	Search for and then select a published classification.

For example, a predicate that is designed to obfuscate United Kingdom data assets that are in the PII and Address columns before arriving in Japan might look like this when the data direction is incoming:

If source sovereignty contains any United Kingdom
And
If the target sovereignty contains any Japan
Then
Obfuscate data in columns containing Column name
PII Address

Actions

The action of the data location rule defines the effect of enforcing the rule. The action prevents affected catalog members from accessing or viewing the original data, as specified by the conditions. If the source and target sovereign locations are the same when the rule is evaluated, the rule is not enforced and data access is allowed.

You choose from three types of actions.

Action	Scope	Result
Deny access to data	All data values in all columns of the data asset	Affected users cannot preview any data values, view the asset profile, or use the asset data.
Allow access to data	All data values in all columns of the data asset	Affected users can preview any data values, view the asset profile, use the data, or perform actions on the asset. Users can also download the assets or add them to a project.
Redact columns	The values in column that match the masking criteria	Affected users see values replaced with a string of one repeated character. Masking can extend to projects. See Masking in projects.
Obfuscate columns	The values in column that match the masking criteria	Affected users see data replaced with similar values and in the same format. Masking can extend to projects. See Masking in projects.
Substitute columns	The values in column that match the masking criteria	Affected users see data replaced with a hashed value. Masking can extend to projects. See Masking in projects.

Masking

To mask data, the data must conform to these requirements:

The data is structured. The data must be in relational tables or CSV, Avro, partitioned data, or Parquet files.
The column headers contain only alphanumeric characters (a-z, A-Z, 0-9). The column headers can't contain unsupported characters, such as, multi-byte characters or special characters.

When you choose the masking action, you must specify the masking criteria and the masking method.

Masking criteria

The masking criteria identifies the columns to mask. You select the type of column property, and specify one or more specific values of the property, which are logically combined with the OR operator.

Type of column property	Description	Specific values
Business term	A business term that is assigned to the column.	Search for and then select one or more published business terms.
Data class	The data class that is assigned to the column.	Search for and then select one or more published data classes.
Tag	A tag that is assigned to a column in the asset.	Enter one or more tags, separated by commas.
Column name	The name of a column.	Enter one or more column names, separated by commas.

For example, suppose you choose the column property of Data class and the specific values of California State Driver's License and Nevada State Driver's License. Values are then masked in columns that are assigned either the California State Driver's License or the Nevada State Driver's License data class.

Masking methods

The main differences between the masking methods are how much of the original characteristics of the data remain. The more original characteristics of the data that is retained, the more useful, but the less secure, the masked data becomes. When you choose a masking method, consider these factors:

Data integrity: Whether to repeat the same masked value for a repeated original value to maintain referential integrity between tables.
Data format: Whether to retain the format of the original data. Preserving the format means that letters are replaced by letters with the same case, digits are replaced by digits, and the number of characters is the same.

The following table describes how each masking method affects these characteristics.

Method	Description	Preserves integrity?	Preserves data format?
Redact	Replace values with ten X characters. The most secure method.	No	No
Substitute	Replace values with randomly generated values that preserve referential integrity.	Yes	No
Obfuscate	Replace values with values that preserve referential integrity and the original data format. The least secure method.	Yes	Yes

For virtual data, the masking behavior is slightly different, based on the data field definition. See Masking virtual data.

Redact

The redact method replaces each data value with a string of exactly 10 letters of X. With redacted data, the format of the data and data integrity are not preserved. Redact is the most secure masking method, but results in the least useful masked data.

For example, the phone number 510-555-1234 is replaced with XXXXXXXXXX. All other phone numbers are replaced with the same value.

You can specify advanced redaction options for criteria that are based on data classes with advanced data masking. However, advanced data masking is not enforced automatically. You must apply it to selected data assets in a project and then publish the masked assets to a catalog.

Substitute

The substitute method replaces data with values that don't match the original format. However, it does preserve referential integrity for repeated values for all assets in the catalog. The substituted values are meaningless and the original format of the values can't be determined. Substitute provides security and data usefulness in between the Redact and Obfuscate methods.

For example, the phone number 510-555-1234 is always replaced with 500ddcc98133703531re3456.

Obfuscate

The obfuscate method replaces the data values with similarly formatted values that match the original format and preserves referential integrity for repeated values. Because the obfuscated values are similarly formatted, they can be valid values. Obfuscate is the least secure masking method, but results in the most useful masked data.

For example, the phone number 510-555-1234 is always replaced with 415-987-6543.

However, the obfuscate method is limited to data values in columns that have assigned data classes with the following types of information:

Personal information, for example, basic attributes of an individual, such as honorific or name suffix.
Contact details, for example, email addresses, phone numbers, state, postal addresses, latitude, or longitude.
Financial accounts, for example, credit cards, banking, or other financial account numbers.
Government identities, for example, personal identification numbers issued by governments, such as SSN (US social security numbers) and CCN (credit card numbers).
Personal demographic information, for example, religion, ethnicity, marital status, hobbies, or employee status.
Connectivity data, for example, IP address, or mac address.

If you create a rule to obfuscate data and the rule is enforced on data that is not assigned a data class that supports obfuscation, the substitute method is used instead.

You can specify advanced obfuscation options for masking criteria that are based on data classes with advanced data masking. However, advanced data masking is not enforced automatically. You must apply it to selected data assets in a project and then publish the masked assets to a catalog.

Learn more

Parent topic: Data location rules