Feature differences between Cloud Pak for Data deployments

Cloud Pak for Data as a Service and Cloud Pak for Data software have some differences in features and implementation. Cloud Pak for Data as a Service is a set of IBM Cloud services. Cloud Pak for Data 5.0 is offered as software that you must install and maintain. Services that are available on both deployments also have differences in features on Cloud Pak for Data as a Service compared to Cloud Pak for Data 5.0, 4.8, and 4.7.

Platform differences
Common features across services
Watson Studio
Watson Machine Learning
watsonx.governance
IBM Knowledge Catalog
DataStage
Watson OpenScale
Watson Query

Platform differences

Cloud Pak for Data as a Service and Cloud Pak for Data software share a common code base, however, they differ in the following key ways:

Platform differences
Features	As a service	Software
Software, hardware, and installation	Cloud Pak for Data as a Service is fully managed by IBM on IBM Cloud. Software updates are automatic. Scaling of compute resources and storage is automatic. You sign up at https://dataplatform.cloud.ibm.com.	You provide and maintain hardware. You install, maintain, and upgrade the software. See Software requirements.
Storage	You provision a IBM Cloud Object Storage service instance to provide storage. See IBM Cloud Object Storage.	You provide persistent storage on a Red Hat OpenShift cluster. See Storage requirements.
Compute resources for running workloads	Users choose the appropriate runtime for their jobs. Compute usage is billed based on the rate for the runtime environment and the duration of the job. See Monitor account resource usage.	You set up the number of Red Hat OpenShift nodes with the appropriate number of vCPUs. See Hardware requirements and Monitoring the platform.
Cost	You buy each service that you need at the appropriate plan level. Many services bill for compute resource consumption. See each service page in the IBM Cloud catalog or in the services catalog on Cloud Pak for Data as a Service, by selecting Services > Services catalog from the navigation menu.	You buy a software license based on the services that you need. For example, the Cloud Pak for Data Enterprise Edition license includes entitlement to services such as Watson Studio or IBM Knowledge Catalog. See Cloud Pak for Data.
Security, compliance, and isolation	The data security, network security, security standards compliance, and isolation of Cloud Pak for Data as a Service are managed by IBM Cloud. You can set up extra security and encryption options. See Security of Cloud Pak for Data as a Service.	Red Hat OpenShift Container Platform provides basic security features. Cloud Pak for Data is assessed for various Privacy and Compliance regulations and provides features that you can use in preparation for various privacy and compliance assessments. You are responsible for additional security features, encryption, and network isolation. See Security considerations.
Available services	Most data fabric services are available in both deployment environments. See Services for Cloud Pak for Data as a Service.	Includes many other services. See Services for Cloud Pak for Data 5.0.
User management	You add users and user groups and manage their account roles and permissions with IBM Cloud Identity and Access Management. See Add users to the account. You can also set up SAML federation on IBM Cloud. See IBM Cloud docs: How IBM Cloud IAM works.	You can add users and create user groups from the Administration menu. You can use the Identity and Access Management Service or use your existing SAML SSO or LDAP provider for identity and password management. You can create dynamic, attribute-based user groups. See User management.

Common core functionality across services

The following core functionality that is provided with the platform is effectively the same for services on Cloud Pak for Data as a Service, Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

Global search for assets and artifacts across the platform
The Platform assets catalog for sharing connections across the platform
Role-based user management within collaborative workspaces across the platform
Common infrastructure for assets and workspaces
A services catalog for adding services
View compute usage from the Administration menu

The following table describes differences in core functionality across services between Cloud Pak for Data as a Service and Cloud Pak for Data software versions 5.0, 4.8, and 4.7.

Differences in common features across services
Feature	As a service	Software
Manage all projects	Users with the Manage projects permission from the IAM service access Manager role for the IBM Cloud Pak for Data service can join any project with the Admin role and then manage or delete the project.	Users with the Manage projects permission can join any project with the Admin role and then manage or delete the project.
Connections to remote data sources	Most supported data sources are common to both deployment environments. See Supported connections.	See Supported data sources.
Connection credentials that are personal or shared	Connections in projects and catalogs can require personal credentials or allow shared credentials. Shared credentials can be disabled at the account level.	Platform connections can require personal credentials or allow shared credentials. Shared credentials can be disabled at the platform level.
Connection credentials from secrets in a vault	Not available	Available
Kerberos authentication	Not available	Available for some services and connections
Sample assets and projects from the Resource hub app	Available	Not available
Custom JDBC connector	Not available	Available starting in 4.8.0
Data source definitions	Not available	Available starting in 5.0. See Data protection with data source definitions.

Watson Studio

The following Watson Studio features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

Collaboration in projects and deployment spaces
Accessing project assets programmatically
Project import and export by using a project ZIP file
Jupyter notebooks
Job scheduling
Data Refinery
Watson Natural Language Processing for Python

This table describes the feature differences between the Watson Studio service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see Watson Studio offering plans.

Differences in Watson Studio
Feature	As a service	Software
Create project	Create: • An empty project • A project from a sample in the Resource hub • A project from file	Create: • An empty project • A project from file • A project with Git integration
Git integration	• Publish notebooks on GitHub • Publish notebooks as gist	• Integrate a project with Git • sync assets to repository in one project and use those assets into another project
Project terminal for advanced Git operations	Not available	Available in projects with default Git integration
Organize assets in projects with folders	Not available	Available starting with 4.8.0
JupyterLab	Not available	Available in projects with Git integration
Visual Studio Code integration	Not available	Available in projects with Git integration
RStudio	Cannot integrate with Git	Can integrate with Git. Requires an RStudio Server Runtimes service.
Python scripts	Not available	Work with Python scripts in JupyterLab. Requires a Watson Studio Runtimes service.
Generate code to load data to a notebook by using the Flight service	Not available	Available
Manage notebook lifecycle	Not available	Use CPDCTL for notebook lifecycle management
Code package assets (set of dependent files in a folder structure)	Not available	Use CPDCTL to create code package assets in a deployment space
Promote notebooks to spaces	Not available	Available manually from the project's Assets page or programmatically by using CPDCTL
Python with GPU	Support available for a single GPU type only	Support available for multiple Nvidia GPU types. Requires a Watson Studio Runtimes service.
Create and use custom images	Not available	Create custom images for Python (with and without GPU), R, JupyterLab (with and without GPU), RStudio, and SPSS environments. Requires a Watson Studio Runtimes and other applicable services.
Anaconda Repository	Not available	Use to create custom environments and custom images
Hadoop integration	Not available	Build and train models, and run Data Refinery flows on a Hadoop cluster. Requires the Execution Engine for Apache Hadoop service.
Decision Optimization	Available	Requires the Decision Optimization service.
SPSS Modeler	Available	Requires the SPSS Modeler service.
Orchestration Pipelines	Available	Requires the Orchestration Pipelines service.

Watson Machine Learning

The following Watson Machine Learning features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

Collaboration in projects and deployment spaces
Deploy models
Deploy functions
Watson Machine Learning REST APIs
Watson Machine Learning Python client
Create online deployments
Scale and update deployments
Define and use custom components
Use Federated Learning to train a common model with separate and secure data sources
Monitor deployments across spaces
Updated forms for testing online deployment
Use nested pipelines
AutoAI data imputation
AutoAI fairness evaluation
AutoAI time series supporting features

This table describes the differences in features between the Watson Machine Learning service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For details about functionality differences between offering plans on Cloud Pak for Data as a Service, see Watson Machine Learning offering plans.

Feature differences between Watson Machine Learning deployments
Feature	As a service	Software
AutoAI training input	Current supported data sources	Supported data sources change by release
AutoAI experiment compute configuration	Different sizes available	Different sizes available
AutoAI limits on data size and number of prediction targets	Set limits	Limits differ by compute configuration
AutoAI incremental learning	Not available	Available
Deploy using popular frameworks and software specifications	Check for latest supported versions	Supported versions differ by release
Connect to databases for batch deployments	Check for support by deployment type	Check for support by deployment type and by version
Deploy and score Python scripts	Available via Python client	Create scripts in JupyterLab or Python client, then deploy
Deploy and batch score R Scripts	Not available	Available
Deploy Shiny apps	Not available	Create and deploy Shiny apps Deploy from code package
Evaluate jobs for fairness, or drift	Requires the watsonx.governance service	Requires the Watson OpenScale or watsonx.governance service
Evaluate online deployments in a space for fairness, drift or explainability	Not available	Available starting in 4.7 Requires the Watson OpenScale or watsonx.governance service
Evaluate deployed prompt templates in a space		Available
Evaluate detached prompt templates in a space	Not available	Available starting in 5.0
Control space creation	No restrictions by role	Use permissions to control who can view and create spaces
Import from GIT project to space	Not available	Available
Code package automatically created when importing from Git project to space	Not available	Available
Update RShiny app from code package	Not available	Available
Track model details in a model inventory	Register models to view factsheets with lifecycle details. Requires the IBM Knowledge Catalog service.	Available Requires the AI Factsheets or watsonx.governance service.
Create and use custom images	Not available	Create custom images for Python or SPSS
Notify collaborators about Pipeline events	Not available	Use Send Mail to notify collaborators
Deep Learning Experiments	Not available	Requires the Watson Machine Learning Accelerator service
Provision and manage IBM Cloud service instances	Add instances for Watson Machine Learning or Watson OpenScale	Services are provisioned on the cluster by the administrator

watsonx.governance

Feature differences between watsonx.governance deployments
Feature	As a service	Software
Evaluate machine learning models	Yes	Yes
Evaluate prompt templates	Requires watsonx Dallas only	Yes
Integrate with Governance console	Manual integration Requires IBM OpenPages	Yes
Integrate with AWS (Sagemaker)	Manual integration Requires IBM OpenPagesYes
Store AI use cases in IBM Knowledge Catalog	Cloud Pak for Data as a Service only	Not available
Store AI use cases in platform access catalog	watsonx	Yes

IBM Knowledge Catalog

The following features are effectively the same for IBM Knowledge Catalog on Cloud Pak for Data as a Service and on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

Collaboration in projects and catalogs
AI powered search and recommendations in catalogs
Rating and reviewing assets in catalogs
Data Refinery tool in projects
Categories with collaborator roles
Predefined and custom classifications
Predefined and custom data classes
Governance rules
Policies
Data protection rules
Manual profiling of individual relational data assets in a project or a catalog
Automatic profiling of relational data assets added to a governed catalog
Custom asset types, custom properties for assets, and custom relationships between assets in catalogs
Monitor workflow tasks
Deliver masked data sets in projects with masking flows

This table describes the differences in features between the IBM Knowledge Catalog service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see IBM Knowledge Catalog offering plans.

Starting in Cloud Pak for Data version 5.0, you can install the IBM Knowledge Catalog Premium Cartridge or the IBM Knowledge Catalog Standard Cartridge instead of the IBM Knowledge Catalog service. IBM Knowledge Catalog Premium provides the same features as the IBM Knowledge Catalog service plus semantic and generative AI features. IBM Knowledge Catalog Standard provides a subset of IBM Knowledge Catalog features plus semantic and generative AI features.

Differences in IBM Knowledge Catalog
Feature	As a service	Software
Metadata import tool in projects - discovery	Import data assets into projects or catalogs. Support for a subset of project and catalog connections. See Supported data sources for curation and data quality.	Import different types of assets: • Import data assets into projects or catalogs. Most supported connections are the same in both deployment environments. • Import business intelligence reports, assets with their associated transformation scripts, ETL jobs, or data models into catalogs. Requires installation of MANTA Automated Data Lineage without a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality.
Metadata import tool in projects - lineage	Not available.	• Import lineage of data assets into catalogs. • Capture and access lineage of ETL jobs in MANTA Automated Data Lineage (starting in 4.7) Requires installation of MANTA Automated Data Lineage with a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality.
Legacy UI tools	Not available. Use tools in projects instead.	Not available starting in 4.7. Use tools in projects instead.
Metadata enrichment tool in projects	Run profiling, term assignment, quality analysis, and key or relationship analysis on large sets of data assets.	Available.
Enhanced enrichment using semantic capabilities and generative AI	Available.	Not available. Starting in 5.0, install IBM Knowledge Catalog Premium or IBM Knowledge Catalog Standard instead.
Data quality scores	Data quality scores are shown in: • Data quality information for assets in projects and catalogs • Metadata enrichment results	Data quality scores are shown in: • Data quality information for assets in projects and catalogs • Metadata enrichment results • Asset profiles in projects and catalogs. Not available in 4.7 and later. • Quick scan results with the legacy UI. Not available in 4.7 and later. • Data quality projects with the legacy UI. Not available in 4.7 and later.
Detailed data quality information	Data quality page in projects and catalogs, and as part of metadata enrichment results	Available starting in 4.7.
Data quality rules in projects	Available Requires the DataStage service.	Available. Requires the DataStage service.
Data quality SLA rules	Not available.	Monitor data quality and report violations. SLA compliance reports are shown on a data asset's Data quality page in projects. Available starting in 4.7.3.
Remediation workflows for data quality issues	Not available.	Available starting in 4.7.3.
Add multiple assets to a catalog with a file	Not available.	Available starting in 4.7.3.
Asset activities	Requires a paid plan. Available in projects and catalogs.	Available in projects and catalogs.
Data lineage	Not available	Available.
Technical data lineage	Not available	Available Requires that a licensed version of MANTA Automated Data Lineage for IBM Cloud Pak for Data is installed. Generated by running the metadata import tool. Can be accessed from catalogs.
Business terms	Limits for some plans.	Available.
Predefined business terms	Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a IBM Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022.	Not available.
Reference data sets	Limits per plan.	Available.
Custom relationships for artifacts	Requires a paid plan.	Available
Knowledge Accelerators	Requires an Enterprise plan. Download from Resource hub.	Provided with the platform.
Custom workflow configurations for governance artifacts and requests	Available for governance artifacts.	Available.
Custom category roles	Limits per plan.	Available.
Export and import data protection rules	To export data protection rules from any system and import the rules into the same system or a different system, you can use APIs. For details, see Migrating data protection rules.	To export data protection rules from any system and import the rules into the same system or a different system, you can use either APIs or cpd-cli commands. For details, see Migrating data protection rules.
Administrative reports	Requires a paid plan.	Available.
Migrate data from InfoSphere Information Server	Not available.	Available starting in 4.8.
Relationship explorer	Not available.	Available starting in 5.0. Requires installing the optional knowledge graph component with Cloud Pak for Data or IBM Knowledge Catalog Premium Cartridge.

DataStage

The following table describes differences in features between DataStage on Cloud Pak for Data as a Service and DataStage on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7.

Differences in DataStage
Feature	As a service	Software
PX instance management	You can provision instances from a set of pre-defined sizes.	You can provision instances more flexibly by using Cloud Pak for Data Instance administration.
Job compilation	OSH is generated during compilation. Transformer is compiled at runtime.	OSH is generated during compilation. Transformer is compiled during compilation time and is made available to the `/ds-storage` mount. Compilation is done synchronously.
Job runtime	You can submit as many jobs as you want, subject to queueing.	Concurrent job runs are supported. Concurrency is determined by instance capacity and the settings in the `/px-storage/config/wlm.config.xml` file.
Asset management	For files of type .xls, .xlsx, .xml, and .json, only simple structures are supported. Multi-level/nested schemas may not be parsed.	Full support of files of type .csv, .txt, .xls, .xlsx, .xml, and .json is available.
Storage	POSIX-type file-based real storage is not available. Storage is emulated by the use of a Cloud Object Storage project bucket.	Real storage is available in `/px-storage` and `/ds-storage`. You can mount more storage into the PX-runtime pod. See Setting up an NFS mount in DataStage.
Java Integration stage	Available with DataStage-aaS Anywhere	Available
Java library component	Available with DataStage-aaS Anywhere	Available
Generic JDBC connection	Available with DataStage-aaS Anywhere	Available
Excel	Available with DataStage-aaS Anywhere	Available
AVI	Available with DataStage-aaS Anywhere	Available
External Source stage	Available with DataStage-aaS Anywhere	Available
External Target stage	Available with DataStage-aaS Anywhere	Available
Hierarchical stage	Single file or File set option for XML Parser and JSON Parser is not available. Single file, File set, and Large Object option for XML Composer and JSON Composer are not available.	Available
SMP	S, M, L are single node, SMP configuration. Use a remote runtime engine to set up an alternative configuration.	Parallel work loads are managed through logical partitions, which are configured with the APT_CONFIG_FILE option.
SAP Bulk Extract connection	Not available	Available
SAP Delta Extract connection	Not available	Available
Wrapped stage	Available with DataStage-aaS Anywhere	Available
SAP HANA connection	Not available	Available
Text data source in ODBC connection	Not available	Available
Build stage	Available with DataStage-aaS Anywhere	Available
Send reports by using before/after-job subroutines	Not available	Available
Custom stage	Available with DataStage-aaS Anywhere	Available
Apache HBase connection	Available with DataStage-aaS Anywhere	Available
Kerberos authentication for Apache Hive connections	Not available	Available
User-defined functions	Available with DataStage-aaS Anywhere	Available
Before/after-job properties	Available with DataStage-aaS Anywhere	Available
Data service connector	Not available	Available
Db2 database sequence in Slowly Changing Dimension stage, Surrogate Key Generator stage, and Transformer stage	Not available	Available
Use the Apache Hive connection as a target. (Available when Use DataStage properties is selected in the connector.)	Available with DataStage-aaS Anywhere	Available
Parameterize properties with local connections	Not available	Available
Operational Decision Manager stage	Available with DataStage-aaS Anywhere	Available
Deployment spaces	Not available	Available starting in 4.7.0

watsonx.governance

The following watsonx.governance features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

Evaluate deployments for fairness
Evaluate the quality of deployments
Monitor deployments for drift
View and compare model results in an Insights dashboard
Add deployments from the machine learning provider of your choice
Set alerts to trigger when evaluations fall below a specified threshold
Evaluate deployments in a user interface or notebook
Custom evaluations and metrics
View details about evaluations in model factsheets

This table describes the differences in features between the Watson OpenScale service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required.

Differences IBM Watson OpenScale
Feature	As a service	Software
Upload pre-scored test data	Not available	Available
IBM SPSS Collaboration and Deployment Services	Not available	Available
Batch processing	Not available	Available
Support access control by user groups	Not available	Available
Free database and Postgres plans	Available	Postgres available starting in 4.8
Set up multiple instances	Not available	Available
Integration with OpenPages	Available with manual integration	Available
Evaluation of foundation model assets	Not available	Available

Watson Query

On Cloud Pak for Data as a Service, data virtualization functionality is provided by the Watson Query service. The following data virtualization functionality is effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data 5.0, 4.8, and 4.7.

Connecting to supported data sources
Virtualizing data
Governing virtual data using policies and data protection rules
Monitoring and exploring the service
Using the SQL interface
Caching
Column masking
Explore view and reloading of tables
Data sampling in statistics collection
Metadata enrichment

The following Data virtualization functionality appears to be different in the user interface but provides the same basic functionality:

This table describes the differences in features between Watson Query on Cloud Pak for Data as a Service and Data Virtualization (formerly Watson Query) on Cloud Pak for Data software.

Differences in Watson Query
Feature	As a service	Software
Service name	Watson Query	In Cloud Pak for Data 5.0, the serivce is now called Data Virtualization.
Use the Cloud Pak for Data Data Source Definitions (DSD) to enforce IBM Knowledge Catalog data protection rules	Not applicable for SaaS	Available starting in 5.0
Query data in REST API data sources	Not applicable for SaaS	Available starting in 5.0
Query tables from previous Presto and Databricks catalogs with multiple catalog support	Not applicable for SaaS	Available starting in 5.0
Automatically scale service instances	Not applicable for SaaS	Available starting in 5.0
Mask multibyte characters for enhanced privacy of sensitive data	Not applicable for SaaS	Available starting in 5.0
View the data protection rules that are applied to a user	Not applicable for SaaS	Available starting in 5.0
Enhanced security for profiling results in Data Virtualization views	Not applicable for SaaS	Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection	Not applicable for SaaS	Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection	Not applicable for SaaS	Available starting in 5.0
Enhanced security for the Admin role: The Admin role does not have default access to all data.	Not applicable for SaaS	Available starting in 4.8
IBM Knowledge Catalog data protection rules are always enabled for Watson Query data	Not applicable for SaaS	Available starting in 4.8
Secure your ungoverned objects: With IBM Knowledge Catalog data protection rules in Watson Query, virtualized objects that are not published in a governed catalog follow the Default data access convention setting from your rule settings.	Not applicable for SaaS	Available starting in 4.8
Query Presto data: You can create a connection to Presto to access and query data in Presto.	Not applicable for SaaS	Available starting in 4.8
Audit logging to monitor user activity and data access	Available	Available starting in 4.7
Integration with IBM Knowledge Catalog	Required	Optional
Group-based authorization and object-level access for groups	Not available	Available
Support for remote connectors	Not applicable for SaaS	Available
Support for file system based data sources, except in Cloud Object Storage	Not applicable for SaaS	Available
Connecting to data sources that require an uploaded JDBC driver, for example, SAP HANA, Generic JDBC	Not applicable for SaaS	Available
Collecting statistics in the user interface	Not available	Available
Automatic statistics collection during object virtualization	Not available	Available
Access management for multiple groups	Not available	Available
Support for CSV or TSV files in Cloud Object Storage	Not applicable for SaaS	Available
Credentials in vaults for connections in Cloud Object Storage	Not applicable for SaaS	Available

Learn more

Parent topic: Cloud Pak for Data as a Service