0 / 0
Overview of Cloud Pak for Data as a Service
Overview of Cloud Pak for Data as a Service

Overview of Cloud Pak for Data as a Service

Cloud Pak for Data as a Service is a cloud service platform for all your data governance, data engineering, data analysis, and AI lifecycle tasks. Cloud Pak for Data as a Service implements a data fabric solution so that you can provide instant and secure access to trusted data to your organization, automate processes and compliance, and deliver trustworthy AI in your applications.

Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:

  • No installation, management, or updating of software or hardware
  • Simple to scale up or down
  • Secure and compliant
  • Composable services architecture
  • Subscription or consumption-based monthly billing

The Cloud Pak for Data as a Service data fabric solution

A data fabric architecture enables your enterprise to unlock the value of your data in a hybrid multicloud data landscape. Moving to a data fabric architecture transforms the way that your enterprise integrates, governs, and uses data for analytics, data science, customer master data, and compliance.

With a data fabric, you can have a secure and consistent way to access data from disparate sources. You can eliminate inefficient, repetitive, and manual data access and integration processes. A data fabric architecture bridges the gap between the sources and provides business-ready data to support your company's needs. You can work with data from various types of sources across a hybrid and multi-cloud landscape, while you keep that data secure and trusted with the full breadth of integrated data management capabilities.

Image showing a data fabric with various data sources

Your data engineers need tools to prepare, transform, and virtualize data. Your data quality analysts need tools to measure the quality of the data. Your governance team needs tools to control, protect, and enrich your data. Your data consumers, such as business analysts and data scientists, need tools to collaboratively develop insights and models. With the Cloud Pak for Data platform of integrated tools, your organization can efficiently work together to use your data to improve your business.

For more information on the data fabric solution, see Data fabric solution overview. To experience implementing the data fabric, take the data fabric tutorials.

Services and platform architecture

You add features and tools to the Cloud Pak for Data as a Service platform by provisioning services. A set of core services is integrated into the common platform. Other associated services work with the platform but run outside of it. Depending on how you sign up for Cloud Pak for Data as a Service, you might start with a subset of the core services that represent a single data fabric solution use case.

Services architecture

You can provision these types of services from the Cloud Pak for Data as a Service services catalog:

Core services Core services are seamlessly integrated and add tools, workspaces, or compute power to the platform UI:

  • Watson Studio for analyzing data
  • Watson Machine Learning for building and deploying models
  • Watson OpenScale for evaluating models
  • Watson Knowledge Catalog for governing and cataloging data and other assets
  • DataStage for integrating data
  • Watson Query for virtualizing and querying data
  • Match 360 for creating master data
  • Cognos Dashboard Embedded for visualizing data

Associated services IBM Cloud database services that you can use to access data from within the platform but store and manage the data outside the platform.

Watson services that have their own UIs or provide APIs for analyzing data.

Workspaces and assets

Cloud Pak for Data as a Service is organized as a set of collaborative workspaces where you can work with your team or organization. Each workspace has a set of members with roles that provide permissions to perform actions. Most users work with assets. Data assets contain metadata that represents data, while operational assets, such as data pipelines and models, run code to work with data. The following diagram shows the main workspaces, their purposes, and how assets and other items move around the platform.

The main workspaces are projects, catalogs, deployment spaces, and categories. Assets move between projects and deployment spaces and catalogs. Governance artifacts are created in categories and are added as metadata to assets in catalogs.

Projects

Projects are where your data science, data engineering, or data curation teams work with data to create assets, such as, notebooks, dashboards, models, data pipelines, or enriched data assets. Project tools are provided by most of the core services:

  • Watson Studio provides the Data Refinery, Jupyter notebooks editor, SPSS Modeler, Decision Optimization, Pipelines, and RStudio tools
  • Watson Machine Learning provides AutoAI and Federated Learning tools
  • Watson Knowledge Catalog provides the Data Refinery, Metadata import, Metadata enrichment, and masking flows tools
  • DataStage provides the DataStage data pipelines editor
  • Cognos Dashboard Embedded provides the dashboard editor
  • Match 360 provides the Master data configuration tool

The following image shows what the Overview page of a project might look like.

A project contains assets and collaborators.

Catalogs

Catalogs are where your organization finds and stores high-quality, trusted data, and other assets, such as model factsheets. You can find data assets in a catalog and move them into a project to work with the data. Or you can curate data in projects and publish the high-quality data assets to a catalog for others to use. Catalogs require the Watson Knowledge Catalog service.

The following image shows what the Assets page of a catalog might look like.

A catalog contains a view of assets.

Deployment spaces

Deployment spaces are where your ModelOps team deploys models to production and then tests and manages models in production. After you build models in projects, you promote them to deployment spaces. Deployments spaces require the Watson Machine Learning service.

The following image shows what the Overview page of a deployment space might look like.

A deployment space contains assets and collaborators.

Categories

Categories are where your governance team creates and manages governance artifacts that enrich data assets in catalogs. Categories require the Watson Knowledge Catalog service.

The following image shows what a category might look like.

A category contains governance artifacts.

Other workspaces

You can create specialized data assets in other workspaces and move them to projects and catalogs:

  • The Watson Query service provides a workspace to virtualize data assets over many data sources.
  • The Match360 service provides a workspace to configure and explore a 360-degree view of customer data.

The platform includes an integrated Gallery of samples that provides data assets, notebooks, and sample projects. Sample notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain sets of data, models, other assets, and detailed instructions on how to solve a particular business problem. The Gallery also provides Knowledge Accelerators, which contain sets of governance artifacts that you can import to provide business vocabularies for specific industries.

The following image shows what the Gallery looks like.

The Gallery contains samples.

Learn more

Parent topic: Cloud Pak for Data as a Service