Watson Knowledge Catalog overview
Watson Knowledge Catalog provides a secure enterprise catalog management platform that is supported by a policy framework. A catalog connects data and knowledge with the people who need to use it. The policy framework ensures that data access is compliant with your business rules.
The following illustration shows how Watson Knowledge Catalog consists of policy management to control access to assets, catalogs to index and find assets, and projects to work with assets.
A catalog is where you share assets across your enterprise. A catalog includes people and the data and analytic assets that they need to find:
- Collaborators in a catalog have access to data and other assets without needing separate credentials or being able to see the credentials. Collaborators have roles that control what activities they can perform in the catalog.
- An asset in a catalog consists of metadata about a data or an analytic asset. Some of the metadata is automatically generated by Watson Knowledge Catalog.
Ready to go? Get started.
Watch this short video to learn about Watson Knowledge Catalog.
Index and enrich assets
You can add data assets and analytical assets to a catalog. A data asset contains information about the data, including how to access the data, the data format, the classification of the asset, which collaborators can access the data, the asset lineage, and other types of metadata. Data assets can include both relational data and unstructured data, such as PDF or Microsoft Office documents. Analytic assets include Jupyter notebooks, trained models, and dashboards.
You can easily index your data and analytic assets in a catalog. Here’s how you can add assets:
- Leave your data where it is in the cloud or on-premises and just add the connection information to access it.
- Automatically discover and add all tables from a connection to a relational data source as assets in the catalog.
- Upload files to the dedicated encrypted cloud object storage bucket that’s associated with the catalog.
- Publish assets from a Watson Studio project.
- Add data sets from the Gallery as data assets.
- Import or synchronize data assets from IBM InfoSphere Information Governance Catalog.
After you add either relational or unstructured data assets to a catalog, they can be profiled to add generated metadata about contents of data.
You can enrich assets by adding other information to them:
- Ratings and reviews by catalog collaborators
- Tags that catalog collaborators can create to describe assets
- Data classes that describe the type of data in assets
- Business terms that describe data in a standard way for your enterprise
It’s easy to find the assets you need in a catalog. Here’s what you can do:
- Search with keywords and filters that are based on subject tags and other asset properties.
- Look the previews of asset contents to make sure you pick the correct assets.
- Read reviews about assets that are provided by catalog collaborators.
- Choose from recommended assets that are automatically compiled based on your usage history, similar assets, and other factors.
- Choose from the most highly rated assets.
Work with assets in projects
To discover insights by working with data or analytic assets, you need to move the assets to a project. You can also use a project as a staging area to curate data assets or create analytic assets before publishing them to the catalog. Projects contain a select subset of catalog collaborators.
You have these default capabilities for working with assets in projects with Watson Knowledge Catalog:
- Add assets from a catalog to a project to work with them.
- Publish assets from a project to a catalog to make them available for others to use.
- Discover assets from a connection to automatically create them in a project before publishing them to the catalog.
- Cleanse and shape relational data assets with the Data Refinery tool.
You can add other tools to analyze data or create artificial intelligence models by adding the Watson Studio app to your account. Watson Studio and Watson Knowledge Catalog are fully integrated. See Watson Studio overview.
Data Refinery is a self-service data preparation tool that you can use to quickly transform large amounts of raw data into consumable, quality information that’s ready for analysis. You can choose operations from menus or enter dplyr R library code in the command line text box.
These features of Data Refinery make it easy to explore, prepare, and deliver data that people across your organization can trust:
- Powerful operations to clean, organize, fix, and validate your data
- Scripting support for the efficient and flexible manipulation of data
- Scheduling and monitoring of data preparation flows
- Profiles for validating your data
- Visualizations for gaining insight into your data
- Policies that mask data are enforced
- Support for unstructured data
See Refining data.
Control access to data with policies
Policies apply to all catalogs that are from the same IBM Cloud account and that have policy enforcement enabled. Policy tools are available only to users who have special permissions.
With policy tools, you can:
- Create business terms that describe your data to use in policies.
- Write policies to deny access to sensitive data assets.
- Write policies to mask data values in columns that contain sensitive data.
- Monitor trends in policy enforcement over time.
See Data governance.
- Get started
- Watson Studio overview
- About assets
- Catalog data
- Data governance
- Security of IBM Watson apps
- Watson Knowledge Catalog video learning center