You create a governance framework to govern and enrich your data by implementing governance artifacts in collaborative workspaces called categories. Some types of governance artifacts act as metadata to enrich data assets. Other types of governance
artifacts control access to data assets or to other artifacts.
Required service
IBM Knowledge Catalog
You use governance artifacts for these purposes:
Enrichment: Artifacts can add knowledge and meaning to assets.
Control access: Artifacts can control who sees what data or which artifacts.
Identification: Artifacts can act as criteria to identify assets or data for other artifacts.
Quality: Artifacts can be used to monitor data quality.
You can use categories and governance artifacts from any or all of these sources:
Predefined governance artifacts that are provided with IBM Knowledge Catalog
Custom governance artifacts that your governance team creates
The following table briefly describes categories and each type of governance artifact and indicates whether any of the items are predefined or available in Knowledge Accelerators.
Categories organize governance artifacts in a hierarchical structure similar to folders. You can use category roles to define ownership of artifacts, control their authoring, and restrict their visibility.
Examples: Business Performance Indicators, Business Scopes
Business terms implement a common enterprise vocabulary to describe the meaning of data. You create business terms to ensure clarity and compatibility among departments, projects, or products. Business terms are the core of your governance
framework and typically form the bulk of your governance artifacts. You can manually assign business terms to data columns, tables, or files or automatically assign them during metadata enrichment. You can use business terms in governance
rules and enforceable rules to identify the affected data.
Examples: Customer lifetime value, Work phone number
Limited: Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a Watson Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022.
For more information see Predefined business terms.
Each Knowledge Accelerator provides many business terms.
Data classes classify data based on the structure, format, and range of values of the data. Data classes are automatically assigned to matching data columns during profiling and metadata enrichment. You can create data classes by defining
matching criteria with an expression or a reference data set. You can create relationships between data classes and business terms to link data format with business meaning. Related business terms are automatically assigned to data along
with their related data classes. How well columns conform to their data class criteria contributes to data quality analysis. Before you have a robust set of business terms, you can use data classes in enforceable rules to identify the
affected data.
Reference data sets define standard values for specific types of data to classify data and measure consistency. Reference data sets act as lookup tables that map codes and values. You can include a reference data set in the definition
of a data class as part of the data matching criteria. Some reference data sets are standardized by organizations, such as the International Organization for Standardization (ISO). Reference data can be hierarchical or mapped across related
sets.
Classifications describe specific characteristics of the meaning of data. Predefined classifications describe the sensitivity of the data. You can create classifications to describe other characteristics of data or other governance items.
For example, Knowledge Accelerators use classifications to classify business terms. You can use classifications to construct governance policies and rules. Typically, you relate multiple business terms to each classification and then data
is indirectly classified through its assigned business terms. You can also manually assign a classification to a data asset.
Policies describe how to manage and protect data assets. You create policies by combining rules and subpolicies. You can include data protection rules and data location rules in policies to control and manage data. However, policies do
not affect the enforcement of data protection rules and data location rules. You can include governance rules in policies to document standards and procedures.
Governance rules describe how to apply a policy. Governance rules provide a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives. Governance rules are not enforced
by IBM Knowledge Catalog. However, you can relate governance rules to enforceable rules, such as data protection rules and data quality rules.
Data protection rules define how to control access to data based on users and asset properties and assigned governance artifacts. Data protection rules define who can see what data. Within data protection rules, you can include classifications,
data classes, business terms, or tags to identify the data to control. You specify to deny access to data or to mask sensitive data values. Data protection rules are automatically enforced in governed catalogs only. Data protection rules
are not organized or controlled by categories.
Example: Mask columns that are assigned the Passport Identifier business term.
Data location rules control access to data based on their physical and sovereign locations, on users and asset properties, and assigned governance artifacts. Data location rules control who can see what data. Within data location rules,
you can specify the direction the data is leaving from or coming to a physical or sovereign location. You can also include classifications, data classes, business terms, or tags to identify the data to control. You specify to allow access
to data or to mask sensitive data values. Data location rules are automatically enforced in all governed catalogs. Data location rules are not organized or controlled by categories.
Example: Mask columns that are
assigned the Personal Identifiable Information business term in a data asset leaving Germany and accessed in other countries.
Data quality SLA rules monitor the data quality of critical data elements for compliance with certain quality criteria and can trigger remediation workflows in case of violations. You select the data assets and columns that you want to monitor
by name or by assigned business terms. SLA compliance and violations are reported on a data asset's Data quality page. Data quality SLA rules are not organized or controlled by categories.
Example: Report a violation if the completeness dimension score for column ACCOUNT_ID in data asset BANK_ACCOUNT falls below 99% and trigger the default remediation workflow.
None
None
Governance artifacts are scoped to IBM Knowledge Catalog catalogs in the same IBM Cloud account.
You must have the specific Cloud Pak for Data service permissions to work with governance artifacts. See Required permissions.
Some IBM Knowledge Catalog plans have limits on the number of governance artifacts of a specific type that you can create.
Watch this short video to learn about the policies features.
This video provides a visual method to learn the concepts and tasks in this documentation.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Some tools perform the same tasks but have different features and levels of automation.
Jupyter notebook editor
Prepare data
Visualize data
Build models
Deploy assets
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
AutoAI
Build models
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
SPSS Modeler
Prepare data
Visualize data
Build models
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Decision Optimization
Build models
Visualize data
Deploy assets
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Data Refinery
Prepare data
Visualize data
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Orchestration Pipelines
Prepare data
Build models
Deploy assets
Automate the model lifecycle, including preparing data, training models, and creating deployments.
RStudio
Prepare data
Build models
Deploy assets
Work with R notebooks and scripts in an integrated development environment.
Federated learning
Build models
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deployments
Deploy assets
Monitor models
Deploy and run your data science and AI solutions in a test or production environment.
Catalogs
Catalog data
Governance
Find and share your data and other assets.
Metadata import
Prepare data
Catalog data
Governance
Import asset metadata from a connection into a project or a catalog.
Metadata enrichment
Prepare data
Catalog data
Governance
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Data quality rules
Prepare data
Governance
Measure and monitor the quality of your data.
Masking flow
Prepare data
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Governance
Governance
Create your business vocabulary to enrich assets and rules to protect data.
Data lineage
Governance
Track data movement and usage for transparency and determining data accuracy.
AI factsheet
Governance
Monitor models
Track AI models from request to production.
DataStage flow
Prepare data
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Data virtualization
Prepare data
Create a virtual table to segment or combine data from one or more tables.
OpenScale
Monitor models
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Data replication
Prepare data
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Master data
Prepare data
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
watsonx.ai Studio
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
watsonx.ai Runtime
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
IBM Knowledge Catalog
Discover, profile, catalog, and share trusted data in your organization.
DataStage
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
Data Virtualization
View, access, manipulate, and analyze your data without moving it.
Watson OpenScale
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Data Replication
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Match360 with Watson
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Manta Data Lineage
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Project
Where you work with data.
> Projects > View all projects
Catalog
Where you find and share assets.
> Catalogs > View all catalogs
Space
Where you deploy and run assets that are ready for testing or production.
> Deployments
Categories
Where you manage governance artifacts.
> Governance > Categories
Data virtualization
Where you virtualize data.
> Data > Data virtualization
Master data
Where you consolidate data into a 360 degree view.