0 / 0
Feature differences between Cloud Pak for Data deployments

Feature differences between Cloud Pak for Data deployments

Cloud Pak for Data as a Service and Cloud Pak for Data 4.7, 4.6, and 4.5 software have some differences in features and implementation. Cloud Pak for Data as a Service is a set of IBM Cloud services. Cloud Pak for Data 4.7 is offered as software that you must install and maintain. Services that are available on both deployments also have differences in features on Cloud Pak for Data as a Service compared to Cloud Pak for Data 4.7, 4.6, and 4.5.

Platform differences

Cloud Pak for Data as a Service and Cloud Pak for Data 4.7 share a common code base, however, they differ in the following key ways:

Platform differences
Features Cloud Pak for Data as a Service Cloud Pak for Data
Software, hardware, and installation Cloud Pak for Data as a Service is fully managed by IBM on IBM Cloud. Software updates are automatic. Scaling of compute resources and storage is automatic. You sign up at https://dataplatform.cloud.ibm.com. You provide and maintain hardware. You install, maintain, and upgrade Cloud Pak for Data software. See Software requirements.
Storage You provision a IBM Cloud Object Storage service instance to provide storage. See IBM Cloud Object Storage. You provide persistent storage on a Red Hat OpenShift cluster. See Storage requirements.
Compute resources for running workloads Users choose the appropriate runtime for their jobs. Compute usage is billed based on the rate for the runtime environment and the duration of the job. See Monitor account resource usage. You set up the number of Red Hat OpenShift nodes with the appropriate number of vCPUs. See Hardware requirements and Monitoring the platform.
Cost You buy each service that you need at the appropriate plan level. Many services bill for compute resource consumption. See each service page in the IBM Cloud catalog or in the services catalog on Cloud Pak for Data as a Service, by selecting Services > Services catalog from the navigation menu. You buy a Cloud Pak for Data license based on the services that you need. For example, the Enterprise Edition license includes entitlement to services such as Watson Studio or Watson Knowledge Catalog. See Cloud Pak for Data.
Security, compliance, and isolation The data security, network security, security standards compliance, and isolation of Cloud Pak for Data as a Service are managed by IBM Cloud. You can set up extra security and encryption options. See Security of Cloud Pak for Data as a Service. Red Hat OpenShift Container Platform provides basic security features. Cloud Pak for Data is assessed for various Privacy and Compliance regulations and provides features that you can use in preparation for various privacy and compliance assessments. You are responsible for additional security features, encryption, and network isolation. See Security considerations.
Available services Most data fabric services are available in both deployment environments.
See Services for Cloud Pak for Data as a Service.
Includes many other services. See Services for Cloud Pak for Data 4.7.
User management You add users and user groups and manage their account roles and permissions with IBM Cloud Identity and Access Management. See Add users to the account.
You can also set up SAML federation on IBM Cloud. See IBM Cloud docs: What is IBM Cloud Identity and Access Management?
You can add users and create user groups from the Administration menu. You can use the Identity and Access Management Service or use your existing SAML SSO or LDAP provider for identity and password management. You can create dynamic, attribute-based user groups. See User management.

Common features across services

The following features that are provided with the platform are effectively the same for services on Cloud Pak for Data as a Service, Cloud Pak for Data 4.7, 4.6, and 4.5:

  • Global search for assets and artifacts across the platform
  • The Platform assets catalog for sharing connections across the platform
  • Role-based user management within collaborative workspaces across the platform
  • Common infrastructure for assets and workspaces
  • A services catalog for adding services
  • View compute usage from the Administration menu

The following table describes differences in features across services between Cloud Pak for Data as a Service and Cloud Pak for Data 4.7, 4.6, and 4.5.

Differences in common features across services
Feature Cloud Pak for Data as a Service Cloud Pak for Data
Manage all projects Users with the Manage projects permission from the IAM service access Manager role for the IBM Cloud Pak for Data service can join any project with the Admin role and then manage or delete the project. Users with the Manage projects permission can join any project with the Admin role and then manage or delete the project.
Connections to remote data sources Most supported data sources are common to both deployment environments.
See Supported connections.
See Supported data sources.
Connection credentials that are personal or shared Connections in projects and catalogs can require personal credentials or allow shared credentials. Shared credentials can be disabled at the account level. Platform connections can require personal credentials or allow shared credentials. Shared credentials can be disabled at the platform level.
Connection credentials from secrets in a vault Not available
Kerberos authentication Not available Available for some services and connections
Sample assets and projects from the Gallery Not available

Watson Studio

The following Watson Studio features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data 4.7, 4.6, and 4.5:

  • Collaboration in projects and deployment spaces
  • Project import and export by using a project ZIP file
  • Jupyter notebooks
  • Job scheduling
  • Data Refinery

This table describes the feature differences between the Watson Studio service on multiple deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see Watson Studio offering plans.

Differences in Watson Studio
Feature Cloud Pak for Data as a Service Cloud Pak for Data
Create project Create:
• An empty project
• A project from a sample in the Gallery
• A project from file
Create:
• An empty project
• A project from file
• A project with Git integration
Git integration • Publish notebooks on GitHub
• Publish notebooks as gist
• Integrate a project with Git
• sync assets to repository in one project and use those assets into another project
Project terminal for advanced Git operations Not available Available in projects with default Git integration
JupyterLab Not available Available in projects with Git integration
Visual Studio Code integration Not available Available in projects with Git integration starting in 4.6.
RStudio Cannot integrate with Git Can integrate with Git. Requires an RStudio Server Runtimes service.
Python scripts Not available Work with Python scripts in JupyterLab. Requires a Watson Studio Runtimes service.
Access project assets programmatically Use project-lib for Python and R Use ibm-watson-studio-lib for Python and R (successor of project-lib)
Generate code to load data to a notebook by using the Flight service Not available
Watson Natural Language Processing Available for Python Available for Python
Manage notebook lifecycle Not available Use CPDCTL for notebook lifecycle management
Code package assets (set of dependent files in a folder structure) Not available Use CPDCTL to create code package assets in a deployment space
Promote notebooks to spaces Not available Available manually from the project's Assets page or programmatically by using CPDCTL
Python with GPU Support available for a single GPU type only (Nvidia K80) Support available for multiple Nvidia GPU types. Requires a Watson Studio Runtimes service.
Create and use custom images Not available Create custom images for Python (with and without GPU), R, JupyterLab (with and without GPU), RStudio, and SPSS environments. Requires a Watson Studio Runtimes and other applicable services.
Anaconda Repository Not available Use to create custom environments and custom images
Hadoop integration Not available Build and train models, and run Data Refinery flows on a Hadoop cluster. Requires the Execution Engine for Apache Hadoop service.
Decision Optimization Requires the Decision Optimization service.
SPSS Modeler Requires the SPSS Modeler service.
Watson Pipelines Beta release Available starting in 4.6. Requires the Watson Pipelines service.
Dashboards Requires the Cognos Dashboard Embedded service. Requires the Cognos Dashboards service.

Watson Machine Learning

The following Watson Machine Learning features are effectively the same on Cloud Pak for Data as a Service, Cloud Pak for Data 4.5, 4.6, and 4.7:

  • Collaboration in projects and deployment spaces
  • Deploy models
  • Deploy functions
  • Watson Machine Learning REST APIs
  • Watson Machine Learning Python client
  • Create online deployments
  • Scale and update deployments
  • Define and use custom components
  • Use Federated Learning to train a common model with separate and secure data sources
  • Monitor deployments across spaces

This table describes the differences in features between the Watson Machine Learning service on multiple deployment environments, differences between offering plans, and whether additional services are required. For details about functionality differences between offering plans on Cloud Pak for Data as a Service, see Watson Machine Learning offering plans.

Feature differences between Watson Machine Learning deployments
Feature Cloud Pak for Data as a Service Cloud Pak for Data
AutoAI training input Current supported data sources Supported data sources change by release
AutoAI experiment compute configuration 8 CPU and 32 GB Different sizes available
AutoAI limits on data size
and number of prediction targets
Set limits Limits differ by compute configuration
AutoAI data imputation Available starting in 4.5.2
AutoAI fairness evaluation Available starting in 4.5.2
AutoAI time series supporting features Available starting in 4.5.3
AutoAI incremental learning Not available Available starting in 4.6.0
Deploy using popular frameworks
and software specifications
Check for latest supported versions Supported versions differ by release
Connect to databases for batch deployments Check for support by deployment type Check for support by deployment type
and by version
Deploy and score Python scripts Available via Python client Create scripts in JupyterLab or Python client, then deploy
Deploy and batch score R Scripts Not available Available
Deploy Shiny apps Not available Create and deploy Shiny apps
Deploy from code package starting in 4.5
Evaluate jobs for fairness, or drift Requires Watson OpenScale Requires Watson OpenScale
Evaluate online deployments in a space
for fairness, drift or explainability
Not available Available starting in 4.7
Requires a Watson OpenScale instance
Control space creation No restrictions by role Use permissions to control who can view and create spaces
Updated forms for testing online deployment Available starting in 4.5.3
Import from GIT project to space Not available Available starting in 4.5
Code package automatically created when importing
from Git project to space
Not available Available starting in 4.5
Update RShiny app from code package Not available Available starting in 4.6
Track model details in a model inventory Register models to view factsheets with lifecycle details. Requires the Watson Knowledge Catalog service. Available starting in 4.5. Requires the AI Factsheets service.
Create and use custom images Not available Create custom images for Python or SPSS
Notify collaborators about Pipeline events Not available Use Send Mail to notify collaborators, starting in 4.5
Use nested pipelines Available starting in 4.5.2
Import project or space file into a nonempty space Not available Available starting in 4.0.6
Deep Learning Experiments Not available Requires Watson Machine Learning Accelerator service
Provision and manage IBM Cloud service instances Add instances for Watson Machine Learning
or Watson OpenScale
Services are provisioned on the cluster
by the administrator

Watson Knowledge Catalog

The following Watson Knowledge Catalog features are effectively the same on Cloud Pak for Data as a Service, Cloud Pak for Data 4.7, 4.6, and 4.5:

  • Collaboration in projects and catalogs
  • AI powered search and recommendations in catalogs
  • Rating and reviewing assets in catalogs
  • Data Refinery tool in projects
  • Categories with collaborator roles
  • Predefined and custom classifications
  • Predefined and custom data classes
  • Governance rules
  • Policies
  • Data protection rules
  • Manual profiling of individual relational data assets in a project or a catalog
  • Automatic profiling of relational data assets added to a governed catalog
  • Custom asset types, custom properties for assets, and custom relationships between assets in catalogs

This table describes the differences in features between the Watson Knowledge Catalog service on multiple deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see Watson Knowledge Catalog offering plans.

Differences in Watson Knowledge Catalog
Feature Cloud Pak for Data as a Service Cloud Pak for Data
Profiling of unstructured data Automatic profiling of individual assets that are added to a project or a catalog. Not available.
Metadata import tool in projects - discovery Import data assets into projects or catalogs. Support for a subset of project and catalog connections. See Supported data sources for metadata import and metadata enrichment. Import different types of assets:
• Import data assets into projects or catalogs. Most supported connections are the same in both deployment environments.
• Available starting in 4.5: Import business intelligence reports, assets with their associated transformation scripts, or data models (starting in 4.5.2) into catalogs. Requires installation of MANTA Automated Data Lineage without a license key. Support for a subset of catalog connections.

See Supported data sources for metadata import and metadata enrichment.
Metadata import tool in projects - lineage Not available. • Import lineage of data assets into catalogs.
• Capture and access lineage of ETL jobs in MANTA Automated Data Lineage (starting in 4.7)
Requires installation of MANTA Automated Data Lineage with a license key. Support for a subset of catalog connections. See Supported data sources for metadata import and metadata enrichment.
Legacy UI tools Not available. Use tools in projects instead. Not available starting in 4.7. Use tools in projects instead.
Available in 4.6, 4.5, 4.0:
• Metadata import
• Automated discovery
• Data quality analysis
• Information assets view
Available in 4.0:
• QuickScan
Metadata enrichment tool in projects Run profiling, term assignment, and quality analysis on large sets of data assets. Available starting in 4.5.
Data quality scores Data quality scores are shown in:
• Data quality information for assets in projects and catalogs
• Metadata enrichment results
Data quality scores are shown in:
• Data quality information for assets in projects and catalogs
• Metadata enrichment results
• Asset profiles in projects and catalogs. Not available in 4.7.
• Quick scan results with the legacy UI. Not available in 4.7.
• Data quality projects with the legacy UI. Not available in 4.7.
Detailed data quality information Data quality page in projects and catalogs, and as part of metadata enrichment results Available starting in 4.7.
Data quality rules in projects
Requires the DataStage service.
Available starting in 4.5.
Requires the DataStage service.
Asset activities Requires a paid plan.
Available in projects and catalogs.
Available in projects and catalogs.
Data lineage Not available Available starting in 4.5.
Technical data lineage Not available Available starting in 4.5. Requires that a licensed version of MANTA Automated Data Lineage for IBM Cloud Pak for Data is installed. Generated by running the metadata import tool. Can be accessed from catalogs.
Business terms Limits for some plans.
Predefined business terms Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a Watson Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022. Not available.
Deliver masked data sets in projects with Data Privacy
Reference data sets Limits per plan.
Custom properties for artifacts, categories Create from the Administration menu starting in 4.6.
Custom relationships for artifacts Requires a paid plan. Create from the Administration menu starting in 4.6.
Knowledge Accelerators Requires an Enterprise plan.
Download from Gallery.
Provided with the platform starting in 4.5.
Custom workflow configurations for governance artifacts and requests Available for governance artifacts.
Monitor workflow tasks Available starting in 4.5
Custom category roles Limits per plan.
Administrative reports Requires a paid plan. Available starting in 4.5.
Migrate assets from InfoSphere Information Server Not available.
Not available starting in 4.7.

DataStage

The following table describes differences in features between DataStage on Cloud Pak for Data as a Service and DataStage on Cloud Pak for Data 4.0.2 and later.

Differences in DataStage
Feature Cloud Pak for Data as a Service Cloud Pak for Data 4.0.2 and later
PX instance management You can provision instances from a set of pre-defined sizes. You can provision instances more flexibly by using Cloud Pak for Data Instance administration.
Job compilation
  • OSH is generated during compilation.
  • Transformer is compiled at runtime.
  • OSH is generated during compilation.
  • Transformer is compiled during compilation time and is made available to the /ds-storage mount.
  • Compilation is done synchronously.
Job runtime Each instance can run only one job at a time to ensure proper isolation.
  • Concurrent job runs are supported.
  • Concurrency is determined by instance capacity and the settings in the /px-storage/config/wlm.config.properties file.
Asset management For files of type .xls, .xlsx, .xml, and .json, only simple structures are supported. Full support of files of type .csv, .txt, .xls, .xlsx, .xml, and .json is available.
Storage
  • POSIX-type file-based real storage is not available.
  • Storage is emulated by the use of a Cloud Object Storage project bucket.
Java Integration stage Not available
Java library component Not available Available starting in 4.6
Generic JDBC connection Not available
Excel Not available
AVI Not available
External Source stage Not available
External Target stage Not available
Hierarchical stage
  • Single file or File set option for XML Parser and JSON Parser is not available.
  • Single file, File set, and Large Object option for XML Composer and JSON Composer are not available.
MPP and SMP S, M, L are single node, SMP configuration. Parallel work loads are managed through logical partitions, which are configured with the APT_CONFIG_FILE option.
SAP Bulk Extract connection Not available Available starting in 4.5
SAP Delta Extract connection Not available Available starting in 4.5
Wrapped stage Not available Available starting in 4.5
SAP HANA connection Not available
Text data source in ODBC connection Not available
Build stage Not available Available starting in 4.0.9
Send reports by using before/after-job subroutines Not available Available starting in 4.5.2
Nested sequence job migration to IBM Watson Pipelines Not available Available starting in 4.5.2
Custom stage Not available Available starting in 4.5.2
Apache HBase connection Not available Available starting in 4.5.2
Kerberos authentication for Apache Hive connections Not available Available starting in 4.5.2
User-defined functions Not available
Before/after-job properties Not available
Data service connector Not available
Complex flat file connector Not available
Match designer Not available
Survive stage Not available Available starting in 4.6.3
Db2 database sequence in Slowly Changing Dimension stage, Surrogate Key Generator stage, and Transformer stage Not available Available starting in 4.6
Use the Apache Hive connection as a target. (Available when Use DataStage properties is selected in the connector.) Not available Available starting in 4.6.1
Parameterize properties with local connections Not available Available starting in 4.6.1
Operational Decision Manager stage Not available Available starting in 4.6.1
Deployment spaces Not available Available starting in 4.7.0

Watson OpenScale

The following Watson OpenScale functionality is effectively the same on Cloud Pak for Data as a Service, Cloud Pak for Data 4.5, 4.6, and 4.7:

  • Evaluate deployments for fairness
  • Evaluate the quality of deployments
  • Monitor deployments for drift
  • View and compare model results in an Insights dashboard
  • Add deployments from the machine learning provider of your choice
  • Set alerts to trigger when evaluations fall below a specified threshold
  • Evaluate deployments in a user interface or notebook

This table describes the differences in features between the Watson OpenScale service on multiple deployment environments, differences between offering plans, and whether additional services are required.

Differences IBM Watson OpenScale
Feature Cloud Pak for Data as a Service Cloud Pak for Data
Upload pre-scored test data Not available Available starting in 4.0.7
View details about evaluations in model factsheets Available starting in 4.5
IBM SPSS Collaboration and Deployment Services Not available
Batch processing Not available
Support access control by user groups Not available
Free database and Postgres plans Not available
Set up multiple instances Not available
Custom evaluations and metrics Available starting in 4.5
Integration with OpenPages Not available

Watson Query

On Cloud Pak for Data as a Service, Data virtualization functionality is provided by the Watson Query service. On Cloud Pak for Data, the Data Virtualization service was renamed to Watson Query in version 4.6. The following Data virtualization functionality is effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data 4.7.

  • Connecting to supported data sources
  • Virtualizing data
  • Governing virtual data using policies and data protection rules
  • Monitoring and exploring the service
  • Using the SQL interface
  • Caching

The following Data virtualization functionality appears to be different in the user interface but provides the same basic functionality:

This table describes the differences in features between Watson Query on Cloud Pak for Data as a Service and Watson Query (formerly Data Virtualization) on Cloud Pak for Data.

Differences in Watson Query
Feature Cloud Pak for Data as a Service Cloud Pak for Data
Query mode (Max Pushdown and Max Consistency) Not applicable for SaaS Available starting in 4.7
Advanced data masking on virtualized data Not applicable for SaaS Available starting in 4.7
Data source connection access restrictions Not applicable for SaaS Available starting in 4.7
Format and save formatted access plans for performance tuning Not applicable for SaaS Available starting in 4.7
Audit logging to monitor user activity and data access Available starting in 4.7
Integration with Watson Knowledge Catalog Required Optional
Group-based authorization and object-level access for groups Not available
Support for remote connectors Not applicable for SaaS
Support for file system based data sources, except in Cloud Object Storage Not applicable for SaaS
Connecting to data sources that require an uploaded JDBC driver, for example, SAP HANA, Generic JDBC Not applicable for SaaS
Collecting statistics in the user interface Not available
Automatic statistics collection during object virtualization Not available
Column masking
Explore view and reloading of tables Available starting in 4.5
Data sampling in statistics collection Available starting in 4.5
Support for governing data using row level policies Not available Available starting in 4.5
Metadata enrichment Available starting in 4.5
Caching of data protection rules Not available Available starting in 4.5
Access management for multiple groups Not available Available starting in 4.5
Support for CSV or TSV files in Cloud Object Storage Not applicable for SaaS Available starting in 4.6
Credentials in vaults for connections in Cloud Object Storage Not applicable for SaaS Available starting in 4.6

Learn more

Parent topic: Cloud Pak for Data as a Service

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more