Feature differences between Cloud Pak for Data deployments
Cloud Pak for Data as a Service and Cloud Pak for Data software have some differences in features and implementation. Cloud Pak for Data as a Service is a set of IBM Cloud services. Cloud Pak for Data 5.0 is offered as software that you must install and maintain. Services that are available on both deployments also have differences in features on Cloud Pak for Data as a Service compared to Cloud Pak for Data 5.0, 4.8, and 4.7.
- Platform differences
- Common features across services
- Watson Studio
- Watson Machine Learning
- watsonx.governance
- IBM Knowledge Catalog
- DataStage
- Watson OpenScale
- Watson Query
Platform differences
Cloud Pak for Data as a Service and Cloud Pak for Data software share a common code base, however, they differ in the following key ways:
Features | As a service | Software |
---|---|---|
Software, hardware, and installation | Cloud Pak for Data as a Service is fully managed by IBM on IBM Cloud. Software updates are automatic. Scaling of compute resources and storage is automatic. You sign up at https://dataplatform.cloud.ibm.com. | You provide and maintain hardware. You install, maintain, and upgrade the software. See Software requirements. |
Storage | You provision a IBM Cloud Object Storage service instance to provide storage. See IBM Cloud Object Storage. | You provide persistent storage on a Red Hat OpenShift cluster. See Storage requirements. |
Compute resources for running workloads | Users choose the appropriate runtime for their jobs. Compute usage is billed based on the rate for the runtime environment and the duration of the job. See Monitor account resource usage. | You set up the number of Red Hat OpenShift nodes with the appropriate number of vCPUs. See Hardware requirements and Monitoring the platform. |
Cost | You buy each service that you need at the appropriate plan level. Many services bill for compute resource consumption. See each service page in the IBM Cloud catalog or in the services catalog on Cloud Pak for Data as a Service, by selecting Services > Services catalog from the navigation menu. | You buy a software license based on the services that you need. For example, the Cloud Pak for Data Enterprise Edition license includes entitlement to services such as Watson Studio or IBM Knowledge Catalog. See Cloud Pak for Data. |
Security, compliance, and isolation | The data security, network security, security standards compliance, and isolation of Cloud Pak for Data as a Service are managed by IBM Cloud. You can set up extra security and encryption options. See Security of Cloud Pak for Data as a Service. | Red Hat OpenShift Container Platform provides basic security features. Cloud Pak for Data is assessed for various Privacy and Compliance regulations and provides features that you can use in preparation for various privacy and compliance assessments. You are responsible for additional security features, encryption, and network isolation. See Security considerations. |
Available services | Most data fabric services are available in both deployment environments. See Services for Cloud Pak for Data as a Service. |
Includes many other services. See Services for Cloud Pak for Data 5.0. |
User management | You add users and user groups and manage their account roles and permissions with IBM Cloud Identity and Access Management. See Add users to the account. You can also set up SAML federation on IBM Cloud. See IBM Cloud docs: How IBM Cloud IAM works. |
You can add users and create user groups from the Administration menu. You can use the Identity and Access Management Service or use your existing SAML SSO or LDAP provider for identity and password management. You can create dynamic, attribute-based user groups. See User management. |
Common core functionality across services
The following core functionality that is provided with the platform is effectively the same for services on Cloud Pak for Data as a Service, Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:
- Global search for assets and artifacts across the platform
- The Platform assets catalog for sharing connections across the platform
- Role-based user management within collaborative workspaces across the platform
- Common infrastructure for assets and workspaces
- A services catalog for adding services
- View compute usage from the Administration menu
The following table describes differences in core functionality across services between Cloud Pak for Data as a Service and Cloud Pak for Data software versions 5.0, 4.8, and 4.7.
Feature | As a service | Software |
---|---|---|
Manage all projects | Users with the Manage projects permission from the IAM service access Manager role for the IBM Cloud Pak for Data service can join any project with the Admin role and then manage or delete the project. | Users with the Manage projects permission can join any project with the Admin role and then manage or delete the project. |
Connections to remote data sources | Most supported data sources are common to both deployment environments. See Supported connections. |
See Supported data sources. |
Connection credentials that are personal or shared | Connections in projects and catalogs can require personal credentials or allow shared credentials. Shared credentials can be disabled at the account level. | Platform connections can require personal credentials or allow shared credentials. Shared credentials can be disabled at the platform level. |
Connection credentials from secrets in a vault | Not available | Available |
Kerberos authentication | Not available | Available for some services and connections |
Sample assets and projects from the Resource hub app | Available | Not available |
Custom JDBC connector | Not available | Available starting in 4.8.0 |
Data source definitions | Not available | Available starting in 5.0. See Data protection with data source definitions. |
Watson Studio
The following Watson Studio features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:
- Collaboration in projects and deployment spaces
- Accessing project assets programmatically
- Project import and export by using a project ZIP file
- Jupyter notebooks
- Job scheduling
- Data Refinery
- Watson Natural Language Processing for Python
This table describes the feature differences between the Watson Studio service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see Watson Studio offering plans.
Feature | As a service | Software |
---|---|---|
Create project | Create: • An empty project • A project from a sample in the Resource hub • A project from file |
Create: • An empty project • A project from file • A project with Git integration |
Git integration | • Publish notebooks on GitHub • Publish notebooks as gist |
• Integrate a project with Git • sync assets to repository in one project and use those assets into another project |
Project terminal for advanced Git operations | Not available | Available in projects with default Git integration |
Organize assets in projects with folders | Not available | Available starting with 4.8.0 |
JupyterLab | Not available | Available in projects with Git integration |
Visual Studio Code integration | Not available | Available in projects with Git integration |
RStudio | Cannot integrate with Git | Can integrate with Git. Requires an RStudio Server Runtimes service. |
Python scripts | Not available | Work with Python scripts in JupyterLab. Requires a Watson Studio Runtimes service. |
Generate code to load data to a notebook by using the Flight service | Not available | Available |
Manage notebook lifecycle | Not available | Use CPDCTL for notebook lifecycle management |
Code package assets (set of dependent files in a folder structure) | Not available | Use CPDCTL to create code package assets in a deployment space |
Promote notebooks to spaces | Not available | Available manually from the project's Assets page or programmatically by using CPDCTL |
Python with GPU | Support available for a single GPU type only | Support available for multiple Nvidia GPU types. Requires a Watson Studio Runtimes service. |
Create and use custom images | Not available | Create custom images for Python (with and without GPU), R, JupyterLab (with and without GPU), RStudio, and SPSS environments. Requires a Watson Studio Runtimes and other applicable services. |
Anaconda Repository | Not available | Use to create custom environments and custom images |
Hadoop integration | Not available | Build and train models, and run Data Refinery flows on a Hadoop cluster. Requires the Execution Engine for Apache Hadoop service. |
Decision Optimization | Available | Requires the Decision Optimization service. |
SPSS Modeler | Available | Requires the SPSS Modeler service. |
Orchestration Pipelines | Available | Requires the Orchestration Pipelines service. |
Watson Machine Learning
The following Watson Machine Learning features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:
- Collaboration in projects and deployment spaces
- Deploy models
- Deploy functions
- Watson Machine Learning REST APIs
- Watson Machine Learning Python client
- Create online deployments
- Scale and update deployments
- Define and use custom components
- Use Federated Learning to train a common model with separate and secure data sources
- Monitor deployments across spaces
- Updated forms for testing online deployment
- Use nested pipelines
- AutoAI data imputation
- AutoAI fairness evaluation
- AutoAI time series supporting features
This table describes the differences in features between the Watson Machine Learning service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For details about functionality differences between offering plans on Cloud Pak for Data as a Service, see Watson Machine Learning offering plans.
Feature | As a service | Software |
---|---|---|
AutoAI training input | Current supported data sources | Supported data sources change by release |
AutoAI experiment compute configuration | Different sizes available | Different sizes available |
AutoAI limits on data size and number of prediction targets |
Set limits | Limits differ by compute configuration |
AutoAI incremental learning | Not available | Available |
Deploy using popular frameworks and software specifications |
Check for latest supported versions | Supported versions differ by release |
Connect to databases for batch deployments | Check for support by deployment type | Check for support by deployment type and by version |
Deploy and score Python scripts | Available via Python client | Create scripts in JupyterLab or Python client, then deploy |
Deploy and batch score R Scripts | Not available | Available |
Deploy Shiny apps | Not available | Create and deploy Shiny apps Deploy from code package |
Evaluate jobs for fairness, or drift | Requires the watsonx.governance service | Requires the Watson OpenScale or watsonx.governance service |
Evaluate online deployments in a space for fairness, drift or explainability |
Not available | Available starting in 4.7 Requires the Watson OpenScale or watsonx.governance service |
Evaluate deployed prompt templates in a space | Available | |
Evaluate detached prompt templates in a space | Not available | Available starting in 5.0 |
Control space creation | No restrictions by role | Use permissions to control who can view and create spaces |
Import from GIT project to space | Not available | Available |
Code package automatically created when importing from Git project to space |
Not available | Available |
Update RShiny app from code package | Not available | Available |
Track model details in a model inventory | Register models to view factsheets with lifecycle details. Requires the IBM Knowledge Catalog service. | Available Requires the AI Factsheets or watsonx.governance service. |
Create and use custom images | Not available | Create custom images for Python or SPSS |
Notify collaborators about Pipeline events | Not available | Use Send Mail to notify collaborators |
Deep Learning Experiments | Not available | Requires the Watson Machine Learning Accelerator service |
Provision and manage IBM Cloud service instances | Add instances for Watson Machine Learning or Watson OpenScale |
Services are provisioned on the cluster by the administrator |
watsonx.governance
Feature | As a service | Software |
---|---|---|
Evaluate machine learning models | Yes | Yes |
Evaluate prompt templates | Requires watsonx Dallas only |
Yes |
Integrate with Governance console | Manual integration Requires IBM OpenPages |
Yes |
Integrate with AWS (Amazon SageMaker) | Manual integration Requires IBM OpenPages |
Yes |
Store AI use cases in IBM Knowledge Catalog | Cloud Pak for Data as a Service only | Not available |
Store AI use cases in platform access catalog | watsonx | Yes |
IBM Knowledge Catalog
The following features are effectively the same for IBM Knowledge Catalog on Cloud Pak for Data as a Service and on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:
- Collaboration in projects and catalogs
- AI powered search and recommendations in catalogs
- Rating and reviewing assets in catalogs
- Data Refinery tool in projects
- Categories with collaborator roles
- Predefined and custom classifications
- Predefined and custom data classes
- Governance rules
- Policies
- Data protection rules
- Manual profiling of individual relational data assets in a project or a catalog
- Automatic profiling of relational data assets added to a governed catalog
- Custom asset types, custom properties for assets, and custom relationships between assets in catalogs
- Monitor workflow tasks
- Deliver masked data sets in projects with masking flows
This table describes the differences in features between the IBM Knowledge Catalog service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see IBM Knowledge Catalog offering plans.
Starting in Cloud Pak for Data version 5.0, you can install the IBM Knowledge Catalog Premium Cartridge or the IBM Knowledge Catalog Standard Cartridge instead of the IBM Knowledge Catalog service. IBM Knowledge Catalog Premium provides the same features as the IBM Knowledge Catalog service plus semantic and generative AI features. IBM Knowledge Catalog Standard provides a subset of IBM Knowledge Catalog features plus semantic and generative AI features.
Feature | As a service | Software |
---|---|---|
Metadata import tool in projects - discovery | Import data assets into projects or catalogs. Support for a subset of project and catalog connections. See Supported data sources for curation and data quality. | Import different types of assets: • Import data assets into projects or catalogs. Most supported connections are the same in both deployment environments. • Import business intelligence reports, assets with their associated transformation scripts, ETL jobs, or data models into catalogs. Requires installation of MANTA Automated Data Lineage without a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality. |
Metadata import tool in projects - lineage | Not available. | • Import lineage of data assets into catalogs. • Capture and access lineage of ETL jobs in MANTA Automated Data Lineage (starting in 4.7) Requires installation of MANTA Automated Data Lineage with a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality. |
Legacy UI tools | Not available. Use tools in projects instead. | Not available starting in 4.7. Use tools in projects instead. |
Metadata enrichment tool in projects | Run profiling, term assignment, quality analysis, and key or relationship analysis on large sets of data assets. | Available. |
Enhanced enrichment using semantic capabilities and generative AI | Available. | Not available. Starting in 5.0, install IBM Knowledge Catalog Premium or IBM Knowledge Catalog Standard instead. |
Data quality scores | Data quality scores are shown in: • Data quality information for assets in projects and catalogs • Metadata enrichment results |
Data quality scores are shown in: • Data quality information for assets in projects and catalogs • Metadata enrichment results • Asset profiles in projects and catalogs. Not available in 4.7 and later. • Quick scan results with the legacy UI. Not available in 4.7 and later. • Data quality projects with the legacy UI. Not available in 4.7 and later. |
Detailed data quality information | Data quality page in projects and catalogs, and as part of metadata enrichment results | Available starting in 4.7. |
Data quality rules in projects | Available Requires the DataStage service. |
Available. Requires the DataStage service. |
Data quality SLA rules | Not available. | Monitor data quality and report violations. SLA compliance reports are shown on a data asset's Data quality page in projects. Available starting in 4.7.3. |
Remediation workflows for data quality issues | Not available. | Available starting in 4.7.3. |
Add multiple assets to a catalog with a file | Not available. | Available starting in 4.7.3. |
Asset activities | Requires a paid plan. Available in projects and catalogs. |
Available in projects and catalogs. |
Business lineage | Not available | Available. |
Technical data lineage | Not available | Available Requires that a licensed version of MANTA Automated Data Lineage for IBM Cloud Pak for Data is installed. Generated by running the metadata import tool. Can be accessed from catalogs. |
|Business terms |Limits for some plans.|Available.| |Predefined business terms |Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a IBM Knowledge Catalog
service instance with a Lite or Standard plan after 7 October 2022.|Not available.| |Reference data sets |Limits per plan.|Available.| |Custom relationships for artifacts|Requires a paid plan.|Available| |Knowledge Accelerators |Requires an
Enterprise plan.
Download from Resource hub.|Provided with the platform.| |Custom workflow configurations for governance artifacts and requests|Available for governance artifacts.|Available.| |Custom category roles |Limits per plan.|Available.|
|Export and import data protection rules|To export data protection rules from any system and import the rules into the same system or a different system, you can use APIs. For details, see Migrating data protection rules.|To
export data protection rules from any system and import the rules into the same system or a different system, you can use either APIs or cpd-cli commands. For details, see Migrating data protection rules.| |Administrative reports |Requires a paid plan.| Available.| |Migrate data from InfoSphere Information Server|Not available.|Available starting in 4.8.| |Relationship explorer
|Not available.|Available starting in 5.0.
Requires installing the optional knowledge graph component with Cloud Pak for Data or IBM Knowledge Catalog Premium Cartridge.|
DataStage
The following table describes differences in features between DataStage on Cloud Pak for Data as a Service and DataStage on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7.
Feature | As a service | Software |
---|---|---|
PX instance management | You can provision instances from a set of pre-defined sizes. | You can provision instances more flexibly by using Cloud Pak for Data Instance administration. |
Job compilation |
|
|
Job runtime | You can submit as many jobs as you want, subject to queueing. |
|
Asset management | For files of type .xls, .xlsx, .xml, and .json, only simple structures are supported. Multi-level/nested schemas may not be parsed. | Full support of files of type .csv, .txt, .xls, .xlsx, .xml, and .json is available. |
Storage |
|
|
Java Integration stage | Available with DataStage-aaS Anywhere | Available |
Java library component | Available with DataStage-aaS Anywhere | Available |
Generic JDBC connection | Available with DataStage-aaS Anywhere | Available |
Excel | Available with DataStage-aaS Anywhere | Available |
AVI | Available with DataStage-aaS Anywhere | Available |
External Source stage | Available with DataStage-aaS Anywhere | Available |
External Target stage | Available with DataStage-aaS Anywhere | Available |
Hierarchical stage |
|
Available |
SMP | S, M, L are single node, SMP configuration. Use a remote runtime engine to set up an alternative configuration. | Parallel work loads are managed through logical partitions, which are configured with the APT_CONFIG_FILE option. |
SAP Bulk Extract connection | Not available | Available |
SAP Delta Extract connection | Not available | Available |
Wrapped stage | Available with DataStage-aaS Anywhere | Available |
SAP HANA connection | Not available | Available |
Text data source in ODBC connection | Not available | Available |
Build stage | Available with DataStage-aaS Anywhere | Available |
Send reports by using before/after-job subroutines | Available with DataStage-aaS Anywhere | Available |
Custom stage | Available with DataStage-aaS Anywhere | Available |
Apache HBase connection | Available with DataStage-aaS Anywhere | Available |
Kerberos authentication for Apache Hive connections | Not available | Available |
User-defined functions | Available with DataStage-aaS Anywhere | Available |
User-created APT_CONFIG_FILEs | Available with DataStage-aaS Anywhere | Available |
Before/after-job properties | Available with DataStage-aaS Anywhere | Available |
Data service connector | Not available | Available |
Db2 database sequence in Slowly Changing Dimension stage, Surrogate Key Generator stage, and Transformer stage | Available with DataStage-aaS Anywhere | Available |
Use the Apache Hive connection as a target. (Available when Use DataStage properties is selected in the connector.) | Available with DataStage-aaS Anywhere | Available |
Parameterize properties with local connections | Not available | Available |
Operational Decision Manager stage | Available with DataStage-aaS Anywhere | Available |
Deployment spaces | Not available | Available starting in 4.7.0 |
watsonx.governance
The following watsonx.governance features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:
- Evaluate deployments for fairness
- Evaluate the quality of deployments
- Monitor deployments for drift
- View and compare model results in an Insights dashboard
- Add deployments from the machine learning provider of your choice
- Set alerts to trigger when evaluations fall below a specified threshold
- Evaluate deployments in a user interface or notebook
- Custom evaluations and metrics
- View details about evaluations in model factsheets
This table describes the differences in features between the Watson OpenScale service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required.
Feature | As a service | Software |
---|---|---|
Upload pre-scored test data | Not available | Available |
IBM SPSS Collaboration and Deployment Services | Not available | Available |
Batch processing | Not available | Available |
Support access control by user groups | Not available | Available |
Free database and Postgres plans | Available | Postgres available starting in 4.8 |
Set up multiple instances | Not available | Available |
Integration with OpenPages | Available with manual integration | Available |
Evaluation of foundation model assets | Not available | Available |
Watson Query
On Cloud Pak for Data as a Service, data virtualization functionality is provided by the Watson Query service. The following data virtualization functionality is effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data 5.0, 4.8, and 4.7.
- Connecting to supported data sources
- Virtualizing data
- Governing virtual data using policies and data protection rules
- Monitoring and exploring the service
- Using the SQL interface
- Caching
- Column masking
- Explore view and reloading of tables
- Data sampling in statistics collection
- Metadata enrichment
The following Data virtualization functionality appears to be different in the user interface but provides the same basic functionality:
- Publishing virtual data to catalogs
- Managing access to virtual objects
- Administering users and roles
- Scaling the service
- Collecting statistics in the web client in Watson Query
This table describes the differences in features between Watson Query on Cloud Pak for Data as a Service and Data Virtualization (formerly Watson Query) on Cloud Pak for Data software.
Feature | As a service | Software |
---|---|---|
Service name | Watson Query | In Cloud Pak for Data 5.0, the serivce is now called Data Virtualization. |
Use the Cloud Pak for Data Data Source Definitions (DSD) to enforce IBM Knowledge Catalog data protection rules | Not applicable for SaaS | Available starting in 5.0 |
Query data in REST API data sources | Not applicable for SaaS | Available starting in 5.0 |
Query tables from previous Presto and Databricks catalogs with multiple catalog support | Not applicable for SaaS | Available starting in 5.0 |
Automatically scale service instances | Not applicable for SaaS | Available starting in 5.0 |
Mask multibyte characters for enhanced privacy of sensitive data | Not applicable for SaaS | Available starting in 5.0 |
View the data protection rules that are applied to a user | Not applicable for SaaS | Available starting in 5.0 |
Enhanced security for profiling results in Data Virtualization views | Not applicable for SaaS | Available starting in 5.0 |
Data Virtualization connections in catalogs now reference the platform connection | Not applicable for SaaS | Available starting in 5.0 |
Data Virtualization connections in catalogs now reference the platform connection | Not applicable for SaaS | Available starting in 5.0 |
Enhanced security for the Admin role: The Admin role does not have default access to all data. | Not applicable for SaaS | Available starting in 4.8 |
IBM Knowledge Catalog data protection rules are always enabled for Watson Query data | Not applicable for SaaS | Available starting in 4.8 |
Secure your ungoverned objects: With IBM Knowledge Catalog data protection rules in Watson Query, virtualized objects that are not published in a governed catalog follow the Default data access convention setting from your rule settings. | Not applicable for SaaS | Available starting in 4.8 |
Query Presto data: You can create a connection to Presto to access and query data in Presto. | Not applicable for SaaS | Available starting in 4.8 |
Audit logging to monitor user activity and data access | Available | Available starting in 4.7 |
Integration with IBM Knowledge Catalog | Required | Optional |
Group-based authorization and object-level access for groups | Not available | Available |
Support for remote connectors | Not applicable for SaaS | Available |
Support for file system based data sources, except in Cloud Object Storage | Not applicable for SaaS | Available |
Connecting to data sources that require an uploaded JDBC driver, for example, SAP HANA, Generic JDBC | Not applicable for SaaS | Available |
Collecting statistics in the user interface | Not available | Available |
Automatic statistics collection during object virtualization | Not available | Available |
Access management for multiple groups | Not available | Available |
Support for CSV or TSV files in Cloud Object Storage | Not applicable for SaaS | Available |
Credentials in vaults for connections in Cloud Object Storage | Not applicable for SaaS | Available |
Learn more
- Services for Cloud Pak for Data as a Service
- Services for Cloud Pak for Data 5.0
- Cloud deployment environment options for Cloud Pak for Data 5.0
Parent topic: Cloud Pak for Data as a Service