Cloud Pak for Data as a Service is a cloud service platform for all your data governance, data engineering, data analysis, and AI lifecycle tasks. Cloud Pak for Data as a Service implements a data fabric solution so that you can provide instant
and secure access to trusted data to your organization, automate processes and compliance, and deliver trustworthy AI in your applications.
Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:
No installation, management, or updating of software or hardware
Simple to scale up or down
Secure and compliant
Composable services architecture
Subscription or consumption-based monthly billing
Watch this video to see an overview of Cloud Pak for Data as a Service
This video provides a visual method to learn the concepts and tasks in this documentation.
The Cloud Pak for Data as a Service data fabric solution
Copy link to section
A data fabric architecture enables your enterprise to unlock the value of your data in a hybrid multicloud data landscape. Moving to a data fabric architecture transforms the way that your enterprise integrates, governs, and uses data for analytics,
data science, customer master data, and compliance.
With a data fabric, you can have a secure and consistent way to access data from disparate sources. You can eliminate inefficient, repetitive, and manual data access and integration processes. A data fabric architecture bridges the gap between
the sources and provides business-ready data to support your company's needs. You can work with data from various types of sources across a hybrid and multi-cloud landscape, while you keep that data secure and trusted with the full breadth
of integrated data management capabilities.
Your data engineers need tools to prepare, transform, and virtualize data. Your data quality analysts need tools to measure the quality of the data. Your governance team needs tools to control, protect, and enrich your data. Your data consumers,
such as business analysts and data scientists, need tools to collaboratively develop insights and models. With the Cloud Pak for Data platform of integrated tools, your organization can efficiently work together to use your data to improve
your business.
A data fabric architecture implements active metadata management, which uses machine learning to automate metadata processing. The outcomes of the metadata analysis facilitate automated data discovery, improve confidence in data, and enable
data protection and data governance at scale.
For more information on the data fabric solution, see Use cases. To experience implementing the data fabric, take the data fabric tutorials.
Services and platform architecture
Copy link to section
You add features and tools to the Cloud Pak for Data as a Service platform by provisioning services. A set of core services is integrated into the common platform. Other associated services work with the platform but run outside of it. Depending
on how you sign up for Cloud Pak for Data as a Service, you might start with a subset of the core services that represent a single data fabric solution use case.
You can provision these types of services from the Cloud Pak for Data as a Service services catalog:
Core services Core services are seamlessly integrated and add tools, workspaces, or compute power to the platform UI:
watsonx.ai Studio for analyzing data
watsonx.ai Runtime for building and deploying models
Watson OpenScale for evaluating models
IBM Knowledge Catalog for governing and cataloging data and other assets
DataStage for integrating data
Data Virtualization for virtualizing and querying data
Match 360 for creating master data
Data Replication for replicating data
Cognos Dashboard Embedded for visualizing data
Associated services IBM Cloud database services that you can use to access data from within the platform but store and manage the data outside the platform.
Watson services that have their own UIs or provide APIs for analyzing data.
Workspaces and assets
Copy link to section
Cloud Pak for Data as a Service is organized as a set of collaborative workspaces where you can work with your team or organization. Each workspace has a set of members with roles that provide permissions to perform actions. Most users work
with assets, which are the items that users add to the platform. Data assets contain metadata that represents data, while assets that you create in tools, such as data pipelines and models, run code to work with data. The following diagram
shows the main workspaces, their purposes, and how assets and other items move around the platform.
Projects
Copy link to section
Projects are where your data science, data engineering, or data curation teams work with data to create assets, such as, notebooks, dashboards, models, data pipelines, or enriched data assets. Project tools are provided by most of the core
services:
watsonx.ai Studio provides the Data Refinery, Jupyter notebooks editor, SPSS Modeler, Decision Optimization, Pipelines, and RStudio tools
watsonx.ai Runtime provides AutoAI and Federated Learning tools
IBM Knowledge Catalog provides the Data Refinery, Metadata import, Metadata enrichment, and masking flows tools
DataStage provides the DataStage data pipelines editor
Cognos Dashboard Embedded provides the dashboard editor
Data Replication provides the Data Replication tool
Match 360 provides the Master data configuration tool
The following image shows what the Overview page of a project might look like.
Catalogs
Copy link to section
Catalogs are where your organization finds and stores high-quality, trusted data, and other assets, such as model factsheets. You can find data assets in a catalog and move them into a project to work with the data. Or you can curate data
in projects and publish the high-quality data assets to a catalog for others to use. Catalogs require the IBM Knowledge Catalog service.
The following image shows what the Assets page of a catalog might look like.
Deployment spaces
Copy link to section
Deployment spaces are where your ModelOps team deploys models and other deployable assets to production and then tests and manages deployments in production. After you build models and deployable assets in projects, you promote them to deployment
spaces.
The following image shows what the Overview page of a deployment space might look like.
Categories
Copy link to section
Categories are where your governance team creates and manages governance artifacts that enrich data assets in catalogs. Categories require the IBM Knowledge Catalog service.
The following image shows what a category might look like.
Other workspaces
Copy link to section
You can create specialized data assets in other workspaces and move them to projects and catalogs:
The Data Virtualization service provides a workspace to virtualize data assets over many data sources.
The Match360 service provides a workspace to configure and explore a 360-degree view of customer data.
The Resource hub
Copy link to section
The platform includes an integrated Resource hub that provides sample data assets, notebooks, and projects. Sample notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain
sets of data, models, other assets, and detailed instructions on how to solve a particular business problem. The Resource hub also provides Knowledge Accelerators, which contain sets of governance artifacts that you can import to provide business
vocabularies for specific industries.
The following image shows what the Resource hub looks like.
Watch this video to see a tour of the Resource hub.
This video provides a visual method to learn the concepts and tasks in this documentation.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Some tools perform the same tasks but have different features and levels of automation.
Jupyter notebook editor
Prepare data
Visualize data
Build models
Deploy assets
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
AutoAI
Build models
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
SPSS Modeler
Prepare data
Visualize data
Build models
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Decision Optimization
Build models
Visualize data
Deploy assets
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Data Refinery
Prepare data
Visualize data
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Orchestration Pipelines
Prepare data
Build models
Deploy assets
Automate the model lifecycle, including preparing data, training models, and creating deployments.
RStudio
Prepare data
Build models
Deploy assets
Work with R notebooks and scripts in an integrated development environment.
Federated learning
Build models
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deployments
Deploy assets
Monitor models
Deploy and run your data science and AI solutions in a test or production environment.
Catalogs
Catalog data
Governance
Find and share your data and other assets.
Metadata import
Prepare data
Catalog data
Governance
Import asset metadata from a connection into a project or a catalog.
Metadata enrichment
Prepare data
Catalog data
Governance
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Data quality rules
Prepare data
Governance
Measure and monitor the quality of your data.
Masking flow
Prepare data
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Governance
Governance
Create your business vocabulary to enrich assets and rules to protect data.
Data lineage
Governance
Track data movement and usage for transparency and determining data accuracy.
AI factsheet
Governance
Monitor models
Track AI models from request to production.
DataStage flow
Prepare data
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Data virtualization
Prepare data
Create a virtual table to segment or combine data from one or more tables.
OpenScale
Monitor models
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Data replication
Prepare data
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Master data
Prepare data
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
watsonx.ai Studio
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
watsonx.ai Runtime
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
IBM Knowledge Catalog
Discover, profile, catalog, and share trusted data in your organization.
DataStage
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
Data Virtualization
View, access, manipulate, and analyze your data without moving it.
Watson OpenScale
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Data Replication
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Match360 with Watson
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Manta Data Lineage
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Project
Where you work with data.
> Projects > View all projects
Catalog
Where you find and share assets.
> Catalogs > View all catalogs
Space
Where you deploy and run assets that are ready for testing or production.
> Deployments
Categories
Where you manage governance artifacts.
> Governance > Categories
Data virtualization
Where you virtualize data.
> Data > Data virtualization
Master data
Where you consolidate data into a 360 degree view.