Cloud Pak for Data as a Service is a cloud service platform for all your data governance, data engineering, data analysis, and AI lifecycle tasks. Cloud Pak for Data as a Service implements a data fabric solution so that you can provide instant
and secure access to trusted data to your organization, automate processes and compliance, and deliver trustworthy AI in your applications.
Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:
No installation, management, or updating of software or hardware
Simple to scale up or down
Secure and compliant
Composable services architecture
Subscription or consumption-based monthly billing
Watch this video to see an overview of Cloud Pak for Data as a Service
This video provides a visual method to learn the concepts and tasks in this documentation.
The Cloud Pak for Data as a Service data fabric solution
Copy link to section
A data fabric architecture enables your enterprise to unlock the value of your data in a hybrid multicloud data landscape. Moving to a data fabric architecture transforms the way that your enterprise integrates, governs, and uses data for analytics,
data science, customer master data, and compliance.
With a data fabric, you can have a secure and consistent way to access data from disparate sources. You can eliminate inefficient, repetitive, and manual data access and integration processes. A data fabric architecture bridges the gap between
the sources and provides business-ready data to support your company's needs. You can work with data from various types of sources across a hybrid and multi-cloud landscape, while you keep that data secure and trusted with the full breadth
of integrated data management capabilities.
Your data engineers need tools to prepare, transform, and virtualize data. Your data quality analysts need tools to measure the quality of the data. Your governance team needs tools to control, protect, and enrich your data. Your data consumers,
such as business analysts and data scientists, need tools to collaboratively develop insights and models. With the Cloud Pak for Data platform of integrated tools, your organization can efficiently work together to use your data to improve
your business.
A data fabric architecture implements active metadata management, which uses machine learning to automate metadata processing. The outcomes of the metadata analysis facilitate automated data discovery, improve confidence in data, and enable
data protection and data governance at scale.
For more information on the data fabric solution, see Use cases. To experience implementing the data fabric, take the data fabric tutorials.
Services and platform architecture
Copy link to section
You add features and tools to the Cloud Pak for Data as a Service platform by provisioning services. A set of core services is integrated into the common platform. Other associated services work with the platform but run outside of it. Depending
on how you sign up for Cloud Pak for Data as a Service, you might start with a subset of the core services that represent a single data fabric solution use case.
You can provision these types of services from the Cloud Pak for Data as a Service services catalog:
Core services Core services are seamlessly integrated and add tools, workspaces, or compute power to the platform UI:
watsonx.ai Studio for analyzing data
watsonx.ai Runtime for building and deploying models
Watson OpenScale for evaluating models
IBM Knowledge Catalog for governing and cataloging data and other assets
DataStage for integrating data
Data Virtualization for virtualizing and querying data
Match 360 for creating master data
Data Replication for replicating data
Cognos Dashboard Embedded for visualizing data
Associated services IBM Cloud database services that you can use to access data from within the platform but store and manage the data outside the platform.
Watson services that have their own UIs or provide APIs for analyzing data.
Workspaces and assets
Copy link to section
Cloud Pak for Data as a Service is organized as a set of collaborative workspaces where you can work with your team or organization. Each workspace has a set of members with roles that provide permissions to perform actions. Most users work
with assets, which are the items that users add to the platform. Data assets contain metadata that represents data, while assets that you create in tools, such as data pipelines and models, run code to work with data. The following diagram
shows the main workspaces, their purposes, and how assets and other items move around the platform.
Projects
Copy link to section
Projects are where your data science, data engineering, or data curation teams work with data to create assets, such as, notebooks, dashboards, models, data pipelines, or enriched data assets. Project tools are provided by most of the core
services:
watsonx.ai Studio provides the Data Refinery, Jupyter notebooks editor, SPSS Modeler, Decision Optimization, Pipelines, and RStudio tools
watsonx.ai Runtime provides AutoAI and Federated Learning tools
IBM Knowledge Catalog provides the Data Refinery, Metadata import, Metadata enrichment, and masking flows tools
DataStage provides the DataStage data pipelines editor
Cognos Dashboard Embedded provides the dashboard editor
Data Replication provides the Data Replication tool
Match 360 provides the Master data configuration tool
The following image shows what the Overview page of a project might look like.
Catalogs
Copy link to section
Catalogs are where your organization finds and stores high-quality, trusted data, and other assets, such as model factsheets. You can find data assets in a catalog and move them into a project to work with the data. Or you can curate data
in projects and publish the high-quality data assets to a catalog for others to use. Catalogs require the IBM Knowledge Catalog service.
The following image shows what the Assets page of a catalog might look like.
Deployment spaces
Copy link to section
Deployment spaces are where your ModelOps team deploys models and other deployable assets to production and then tests and manages deployments in production. After you build models and deployable assets in projects, you promote them to deployment
spaces.
The following image shows what the Overview page of a deployment space might look like.
Categories
Copy link to section
Categories are where your governance team creates and manages governance artifacts that enrich data assets in catalogs. Categories require the IBM Knowledge Catalog service.
The following image shows what a category might look like.
Other workspaces
Copy link to section
You can create specialized data assets in other workspaces and move them to projects and catalogs:
The Data Virtualization service provides a workspace to virtualize data assets over many data sources.
The Match360 service provides a workspace to configure and explore a 360-degree view of customer data.
The Resource hub
Copy link to section
The platform includes an integrated Resource hub that provides sample data assets, notebooks, and projects. Sample notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain
sets of data, models, other assets, and detailed instructions on how to solve a particular business problem. The Resource hub also provides Knowledge Accelerators, which contain sets of governance artifacts that you can import to provide business
vocabularies for specific industries.
The following image shows what the Resource hub looks like.
Watch this video to see a tour of the Resource hub.
This video provides a visual method to learn the concepts and tasks in this documentation.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.