Overview of Cloud Pak for Data as a Service

Last updated: Nov 27, 2024

Cloud Pak for Data as a Service is a cloud service platform for all your data governance, data engineering, data analysis, and AI lifecycle tasks. Cloud Pak for Data as a Service implements a data fabric solution so that you can provide instant and secure access to trusted data to your organization, automate processes and compliance, and deliver trustworthy AI in your applications.

Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:

No installation, management, or updating of software or hardware
Simple to scale up or down
Secure and compliant
Composable services architecture
Subscription or consumption-based monthly billing

Watch this video to see an overview of Cloud Pak for Data as a Service

This video provides a visual method to learn the concepts and tasks in this documentation.

The Cloud Pak for Data as a Service data fabric solution

A data fabric architecture enables your enterprise to unlock the value of your data in a hybrid multicloud data landscape. Moving to a data fabric architecture transforms the way that your enterprise integrates, governs, and uses data for analytics, data science, customer master data, and compliance.

With a data fabric, you can have a secure and consistent way to access data from disparate sources. You can eliminate inefficient, repetitive, and manual data access and integration processes. A data fabric architecture bridges the gap between the sources and provides business-ready data to support your company's needs. You can work with data from various types of sources across a hybrid and multi-cloud landscape, while you keep that data secure and trusted with the full breadth of integrated data management capabilities.

Image showing a data fabric with various data sources

Your data engineers need tools to prepare, transform, and virtualize data. Your data quality analysts need tools to measure the quality of the data. Your governance team needs tools to control, protect, and enrich your data. Your data consumers, such as business analysts and data scientists, need tools to collaboratively develop insights and models. With the Cloud Pak for Data platform of integrated tools, your organization can efficiently work together to use your data to improve your business.

A data fabric architecture implements active metadata management, which uses machine learning to automate metadata processing. The outcomes of the metadata analysis facilitate automated data discovery, improve confidence in data, and enable data protection and data governance at scale.

For more information on the data fabric solution, see Use cases. To experience implementing the data fabric, take the data fabric tutorials.

Services and platform architecture

You add features and tools to the Cloud Pak for Data as a Service platform by provisioning services. A set of core services is integrated into the common platform. Other associated services work with the platform but run outside of it. Depending on how you sign up for Cloud Pak for Data as a Service, you might start with a subset of the core services that represent a single data fabric solution use case.

Services architecture

You can provision these types of services from the Cloud Pak for Data as a Service services catalog:

Core services
Core services are seamlessly integrated and add tools, workspaces, or compute power to the platform UI:
- watsonx.ai Studio for analyzing data
- watsonx.ai Runtime for building and deploying models
- Watson OpenScale for evaluating models
- IBM Knowledge Catalog for governing and cataloging data and other assets
- DataStage for integrating data
- Data Virtualization for virtualizing and querying data
- Match 360 for creating master data
- Data Replication for replicating data
- Cognos Dashboard Embedded for visualizing data
Associated services
IBM Cloud database services that you can use to access data from within the platform but store and manage the data outside the platform.

Watson services that have their own UIs or provide APIs for analyzing data.

Workspaces and assets

Cloud Pak for Data as a Service is organized as a set of collaborative workspaces where you can work with your team or organization. Each workspace has a set of members with roles that provide permissions to perform actions. Most users work with assets, which are the items that users add to the platform. Data assets contain metadata that represents data, while assets that you create in tools, such as data pipelines and models, run code to work with data. The following diagram shows the main workspaces, their purposes, and how assets and other items move around the platform.

The main workspaces are projects, catalogs, deployment spaces, and categories. Assets move between projects and deployment spaces and catalogs. Governance artifacts are created in categories and are added as metadata to assets in catalogs.

Projects

Projects are where your data science, data engineering, or data curation teams work with data to create assets, such as, notebooks, dashboards, models, data pipelines, or enriched data assets. Project tools are provided by most of the core services:

watsonx.ai Studio provides the Data Refinery, Jupyter notebooks editor, SPSS Modeler, Decision Optimization, Pipelines, and RStudio tools
watsonx.ai Runtime provides AutoAI and Federated Learning tools
IBM Knowledge Catalog provides the Data Refinery, Metadata import, Metadata enrichment, and masking flows tools
DataStage provides the DataStage data pipelines editor
Cognos Dashboard Embedded provides the dashboard editor
Data Replication provides the Data Replication tool
Match 360 provides the Master data configuration tool

The following image shows what the Overview page of a project might look like.

A project contains assets and collaborators.

Catalogs

Catalogs are where your organization finds and stores high-quality, trusted data, and other assets, such as model factsheets. You can find data assets in a catalog and move them into a project to work with the data. Or you can curate data in projects and publish the high-quality data assets to a catalog for others to use. Catalogs require the IBM Knowledge Catalog service.

The following image shows what the Assets page of a catalog might look like.

A catalog contains a view of assets.

Deployment spaces

Deployment spaces are where your ModelOps team deploys models and other deployable assets to production and then tests and manages deployments in production. After you build models and deployable assets in projects, you promote them to deployment spaces.

The following image shows what the Overview page of a deployment space might look like.

A deployment space contains assets and collaborators.

Other workspaces

You can create specialized data assets in other workspaces and move them to projects and catalogs:

The Data Virtualization service provides a workspace to virtualize data assets over many data sources.
The Match360 service provides a workspace to configure and explore a 360-degree view of customer data.

The Resource hub

The platform includes an integrated Resource hub that provides sample data assets, notebooks, and projects. Sample notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain sets of data, models, other assets, and detailed instructions on how to solve a particular business problem. The Resource hub also provides Knowledge Accelerators, which contain sets of governance artifacts that you can import to provide business vocabularies for specific industries.

The following image shows what the Resource hub looks like.

The Resource hub contains samples.