0 / 0
Frequently asked questions

Frequently asked questions

Find answers to frequently asked questions about Cloud Pak for Data as a Service.

Accounts and setup questions

Cloud Pak for Data as a Service questions


See Service plan changes and deprecations.

Project questions

IBM Cloud Object Storage questions

IBM Knowledge Catalog questions

Notebook questions

Security and reliability questions

Sharing and collaboration questions

Machine learning questions

Watson OpenScale questions

Accounts and setup

How do I sign up for Cloud Pak for Data as a Service?

Go to Cloud Pak for Data as a Service.

For the URLs in other regions, see Regional availability.

Can I try Cloud Pak for Data as a Service for free?

Yes, when you sign up for Cloud Pak for Data as a Service, you automatically provision Lite versions of some of the services, which are free. Many services have free Lite plans. Go to Cloud Pak for Data as a Service.

How do I get a free version of Watson Studio?

If you are signed up for Cloud Pak for Data as a Service, on the home page, find the Quick start section and click Build and manage ML models. Then, click Provision Watson Studio. If you don't see the option to provision Watson Studio, you already provisioned it.

You can also go to the services catalog to provision a Lite plan. Open the navigation menu and select Services > Services catalog, and then select Watson Studio. If you don't see the Create button to create a Lite plan, you already have Lite plan.

You can provision only one Lite plan of Watson Studio. See Watson Studio plans.

You also need appropriate access rights to the resources for the account, as described in IBM Cloud docs: Managing access to resources.

Can I provide feedback?

Yes, we encourage feedback as we continue to develop this platform. From the navigation menu, select Support > Share an idea.

Why is the Create button disabled when I try to provision Watson Studio?

The Create button is unavailable if you have an existing Watson Studio Lite instance in your account or you haven't selected the license agreements checkbox.

If you have a Lite plan of Watson Studio, you can create only one instance of the service. You can see your existing services in the IBM Cloud console on the Resource list page. Alternatively, from Cloud Pak for Data as a Service, open the navigation menu and choose Services > Service instances.

See Watson Studio plans.

Why am I unable to access Watson Studio?

If you are unable to access Watson Studio, check that you meet the following conditions:

  1. You are logged in to an IBM Cloud account.

  2. For that account, the Watson Studio service is provisioned through the IBM Cloud catalog or the Cloud Pak for Data as a Service catalog. The Watson Studio service instance is listed under IBM Cloud resources or under the Service instances in Cloud Pak for Data as a Service. If Watson Studio is not listed, then provision a new instance.

  3. The Watson Studio service instance might be listed under another IBM Cloud account. If you are a member of multiple IBM Cloud accounts, switch accounts to check for the service instance under a different account.

  4. Ensure you have the correct permissions to access Watson Studio. Your account Administrator grants the required permissions. For a description of the roles and permissions, see Roles in Cloud Pak for Data as a Service.

  5. You access the Watson Studio service that uses the link for the region where your service was provisioned. Check Regional limitations for information about features that are not currently available in your region. Following are the links to Cloud Pak for Data as a Service for each region:

Why can't I see all my projects and catalogs across regions?

For some offering plans, you can provision Watson Studio and IBM Knowledge Catalog services in multiple IBM Cloud service regions. However, your projects, catalogs, and data are specific to the region in which they were saved and can be accessed only from your services in that region. You must switch your region to see the projects, catalogs, and data from that region.

How do I upgrade?

When you're ready to upgrade any of the services that you created in Cloud Pak for Data as a Service, you can upgrade in place without losing any of your work or data.

You must be the owner or administrator of the IBM Cloud account for a service to upgrade it. See Upgrade Cloud Pak for Data as a Service and services.

How can I get the IAM Editor role so I can provision service instances?

If you try to provision an instance of a service, for example, the Watson OpenScale service, and you might get this error message:

You do not have the required permission to create an instance. You must be assigned the IAM Editor role or Operator role or higher. Contact the account owner to update your access.

To get the IAM Editor role:

  1. Find your IBM Cloud account owner or administrator.
  2. Ask to be assigned the IAM Editor role for the resource group.

How can I get the most runtime from my Watson Studio Lite plan?

The Watson Studio Lite plan allows for 10 CUH per month. You can maximize your available CUH by setting your assets to use environments with lower CUH rates. For example, you can change your notebook environment. To see the available environments and the required CUH, go to the Services catalog page for Watson Studio.

How do I find my IBM Cloud account owner?

If you have an enterprise account or work in an IBM Cloud that you don't own, you might need to ask an account owner to give you access to a workspace or another role.

To find your IBM Cloud account owner:

  1. From the navigation menu, choose Administration > Access (IAM).
  2. From the avatar menu, make sure you're in the right account, or switch accounts, if necessary.
  3. Click Users, and find the username with the word owner next to it.

To understand roles, see Roles for Cloud Pak for Data as a Service. To determine your roles, see Determine your roles.

Cloud Pak for Data as a Service

What is Cloud Pak for Data as a Service?

Cloud Pak for Data as a Service provides a single, unified interface for a set of core IBM Cloud services and their related services. The core services are Watson Studio, Watson Machine Learning, Watson OpenScale, IBM Knowledge Catalog, Watson Query, DataStage, Match 360, and Cognos Dashboard Embedded. You can add other services to store your data or develop Watson applications.

See Overview of Cloud Pak for Data as a Service.

Why did my product name change to Cloud Pak for Data as a Service?

Your product name changed to Cloud Pak for Data as a Service because you have Watson Studio, Watson Machine Learning, or IBM Knowledge Catalog, plus another service in the Cloud Pak for Data as a Service services catalog, such as DataStage. The features, plans, and costs of your services did not change.

See Relationships between the Watson Studio and IBM Knowledge Catalog services and Cloud Pak for Data as a Service.

What's the difference between Watson Studio and Cloud Pak for Data as a Service?

Watson Studio is a single service, while Cloud Pak for Data as a Service is a platform for a set of services, which includes Watson Studio as one of its core services. The features of Watson Studio are the same in both cases.

See Overview of Cloud Pak for Data as a Service.

What's the difference between Cloud Pak for Data 4.x and Cloud Pak for Data as a Service?

Cloud Pak for Data 4.x is software that you must install and maintain, while Cloud Pak for Data as a Service is a set of IBM Cloud services that are fully managed by IBM. Cloud Pak for Data 4.x has scheduled releases and distinct versions. Cloud Pak for Data as a Service is automatically updated each week and does not have a version number.

See Feature differences between Cloud Pak for Data deployments.

Does Cloud Pak for Data as a Service have a subscription plan?

Yes, Cloud Pak for Data as a Service has a subscription plan. See Upgrading to a Cloud Pak for Data as a Service subscription account.

What connections does Cloud Pak for Data as a Service support?

Cloud Pak for Data as a Service supports many data sources. See Connectors.


Where do I start a new project for Watson Studio?

Log in at Cloud Pak for Data as a Service to go to your home page. Click the Create a project link.

You can see all your projects by opening the navigation menu and selecting View all projects in the Projects section.

Watch the video about creating a project to see how to create both a blank project and a project from a file.

Why can't I create a project from an exported .zip project file?

If you are seeing an error that says that the .zip file doesn't contain a project, you might be trying to import a .zip file from a different platform.

You can import a project from a file on your local system only if the .zip file that you select was exported from a Cloud Pak for Data as a Service project as a compressed file. You cannot import a compressed file that was exported from a project in IBM Cloud Pak for Data.

See Importing a project.

How do I load very large files to my project?

You can't load data files larger than 5 GB to your project. If your files are larger, you must use the Cloud Object Storage API and load the data in multiple parts. See the curl commands for working with Cloud Object Storage directly on IBM Cloud.

See Adding very large objects to a project's Cloud Object Storage.

How do I choose which tool to use?

The tool that you need depends on your type of data, what you want to do with your data, and how much automation you want. To find the right tool, see Choosing a tool.

IBM Cloud Object Storage

What is saved in IBM Cloud Object Storage for workspaces?

When you create a project, deployment space, or catalog, you specify a IBM Cloud Object Storage and create a bucket that is dedicated to that workspace. These types of objects are stored in the IBM Cloud Object Storage bucket for the workspace:

  • Files for data assets that you uploaded into the workspace.
  • Files associated with assets that run in tools, such as, notebooks, dashboards, and models.
  • Metadata about assets, such as, the asset description, tags, and comments or reviews.

Do I need to upgrade IBM Cloud Object Storage when I upgrade other services?

You must upgrade your IBM Cloud Object Storage instance only when you run out of storage space. Other services can use any IBM Cloud Object Storage plan and you can upgrade any service or your IBM Cloud Object Storage service independently.

Why am I unable to add storage to an existing project or to see the IBM Cloud Object Storage selection in the New Project dialog?

IBM Cloud Object Storage requires an extra step for users who do not have administrative privileges for it. The account administrator must enable nonadministrative users to create projects.

If you have administrator privileges and do not see the latest IBM Cloud Object Storage, try again later because server-side caching might cause a delay in rendering the latest values.

IBM Knowledge Catalog

What is IBM Knowledge Catalog?

IBM Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including structured and unstructured data wherever they reside, so that they can be easily accessed and used to fuel data science and AI. For selected source types, IBM Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards, and business analysts to find, understand, share, and use the assets. AI-powered search and recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used, and social connections between users.

IBM Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to ensure that the right data goes to the right people.

Through IBM Knowledge Catalog's business terms, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.

What is the difference between a catalog and a project?

A catalog is where you share assets across the enterprise. A project is where you work with assets within smaller teams. An enterprise catalog can have thousands of assets that are shared with hundreds of users. Projects are designed for a team of collaborators to work with a few assets for a specific goal, such as developing an artificial intelligence model or data preparation, by using Watson Studio.

What data sources and asset types are supported?

IBM Knowledge Catalog supports over 50 connectors to cloud or on premises data source types. See Connectors.

IBM Knowledge Catalog also supports other asset types, such as structured data, unstructured data, models, and notebooks.

How do I load very large files to my catalog?

You can't load data files larger than 5 GB to your catalog from IBM Knowledge Catalog. To add a file that is larger than 5 GB to a catalog, upload the file to IBM Cloud Object Storage and then add it as a connected data asset.

Do I need to move my data into IBM Knowledge Catalog?

No, you can keep all your data in their existing repositories or you can upload local files to the IBM Cloud Object Storage associated with the catalog. The choice is yours.

IBM Knowledge Catalog stores and manages only the metadata of your assets.

What is the maximum number of assets that I can have in catalogs?

The number of assets you can have across all catalogs depends on your plan:

  • Lite plan: 50 assets other than connections and unlimited connection assets
  • Standard plan (starting 02 May 2022): unlimited assets
  • Enterprise Bundle plan (starting 02 May 2022): unlimited assets

If you provisioned a plan before 02 May 2022, you have these limits:

  • Legacy Standard plan: 500 assets
  • Legacy Enterprise plan: unlimited assets
  • Legacy Professional plan: unlimited assets

See IBM Knowledge Catalog offering plans.

Does IBM Knowledge Catalog provide policy services?

IBM Knowledge Catalog includes an automated policy enforcement engine that determines outcomes based on the policies and the action taken place. With IBM Knowledge Catalog, you can set up your policies within the system and restrict access to data based on the defined policies.

Does IBM Knowledge Catalog provide classification services?

For governed catalogs that are created with data protection rule enforcement, IBM Knowledge Catalog automatically classifies the columns in your relational data assets when they are added to the catalog. Over 160 data classes for columns are provided, including names, emails, postal addresses, credit card numbers, driver's licenses, government identification numbers, date of birth, demographic information, DUNS number, and more. For ungoverned catalogs that do not enforce data protection rules, a user can choose to classify, or profile, a relational data asset, but assets are not automatically classified. Catalogs also profile unstructured data assets. See Profile assets.

Does IBM Knowledge Catalog have data wrangling capabilities?

Yes, data preparation capabilities are available in Data Refinery, which is part of IBM Knowledge Catalog. Data Refinery provides a rich set of capabilities that not only allow you to discover, cleanse, and transform your data with built-in operations, but it also comes with powerful profiling and visualization tools such as charts and graphs to help you interact with and understand your data.

Data access and transform policies that are defined in IBM Knowledge Catalog are also enforced in Data Refinery to ensure that sensitive data that originated from governed catalogs remain protected.

Can I set up access groups for people in different lines of business and roles?

You can set up access groups through your IBM Cloud account in the Identity and Asset Management (IAM) area.

After you set up the access groups, on the Access Control page of a catalog, you can add the access group so that all members of the access group can access the catalog with the same permissions. See Add access groups.

Does IBM Knowledge Catalog use Apache Atlas for its metadata repository?

IBM Knowledge Catalog uses its own local store for metadata.

IBM Knowledge Catalog runs on a cloud native persistence store that can meet the platform needs for performance, up-time, and scalability.

When adding assets from catalog or project, or publishing assets from project to catalog, both project and catalog must satisfy criteria:

  • You must be a member of the same Cloud Pak for Data as a Service account in IBM Cloud as the catalog owner, or, if your company set up SAML federation on IBM Cloud, you must be in the same company as the catalog owner.
  • If you want to add catalog assets to the project, you must choose to restrict who can be a collaborator in the project. If you want to publish assets to a catalog, you don't need to restrict the project.
  • You must choose IBM Cloud Object Storage when you create a project. You must the owner of the IBM Cloud Object Storage instance or the IBM Cloud Object Storage instance must be configured to allow project creation.

In the catalog screen, the dropdown for target project when adding assets to project lists only the projects that satisfy all these criteria.

When I create data protection rules, which catalogs will they apply to?

Data protection rules are scoped to the IBM Cloud account and will be enforced on assets in all governed catalogs that belong to the same IBM Cloud account as the data protection rules.

Do data protection rules affect data in external data sources?

No, IBM Knowledge Catalog is a data catalog for searching for data.

Data protection rules affect only how data appears within the catalog. Data protection rules do not affect users who access external data sources directly.

Why can't I add policies or other governance artifacts?

You must have special permissions to create governance artifacts, such as, policies, business terms, data classes, rules, and reference data sets. You must also be a member of a category with a role that provides permission to create artifacts in that category. See Managing governance artifacts.


Can I install libraries or packages to use in my notebooks?

You can install Python libraries and R packages through a notebook, and those libraries and packages will be available to all your notebooks that use the same environment template. For instructions, see Import custom or third-party libraries. If you get an error about missing operating system dependencies when you install a library or package, notify IBM Support. To see the preinstalled libraries and packages and the libraries and packages that you installed, from within a notebook, run the appropriate command:

  • Python!pip list
  • Rinstalled.packages()

Can I call functions that are defined in one notebook from another notebook?

There is no way to call one notebook from another notebook on the platform. However, you can put your common code into a library outside of the platform and then install it.

Can I add arbitrary notebook extensions?

No, you can't extend your notebook capabilities by adding arbitrary extensions as a customization because all notebook extensions must be preinstalled.

How do I access the data from a CSV file in a notebook?

After you load a CSV file into object storage, load the data by clicking the Code snippets icon alt="" in an opened notebook, clicking Read data and selecting the CSV file from the project. Then, click in an empty code cell in your notebook and insert the generated code.

How do I access the data from a compressed file in a notebook?

After you load the compressed file to object storage, get the file credentials by clicking the Code snippets icon alt="" in an opened notebook, clicking Read data and selecting the compressed file from the project. Then, click in an empty code cell in your notebook and load the credentials to the cell. Alternatively, click to copy the credentials to the clipboard and paste them into your notebook.

Security and reliability

How secure is Cloud Pak for Data as a Service?

The Cloud Pak for Data as a Service platform is very secure and resilient. See Security of Cloud Pak for Data as a Service.

Is my data and notebook protected from sharing outside of my collaborators?

The data that is loaded into your project and notebooks is secure. Only the collaborators in your project can access your data or notebooks. Each platform account acts as a separate tenant of the Spark and IBM Cloud Object Storage services. Tenants cannot access other tenant's data.

If you want to share your notebook with the public, then hide your data service credentials in your notebook. For the Python and R languages, enter the following syntax: # @hidden_cell

Be sure to save your notebook immediately after you enter the syntax to hide cells with sensitive data.

Only then should you share your work.

Do I need to back up my notebooks?

No. Your notebooks are stored in IBM Cloud Object Storage, which provides resiliency against outages.

Sharing and collaboration

What are the implications of sharing a notebook?

When you share a notebook, the permalink never changes. Any person with the link can view your notebook. You can stop sharing the notebook by clearing the checkbox to share it. Updates are not automatically shared. When you update your notebook, you can sync the shared notebook by reselecting the checkbox to share it.

How can I share my work outside of RStudio?

One way of sharing your work outside of RStudio is connecting it to a shared GitHub repository that you and your collaborators can work from. Read this blog post for more information.

However, the best method to share your work with the members of a project is to use notebooks in the project that uses the R kernel.

RStudio is a great environment to work in for prototyping and working individually on R projects, but it is not yet integrated with projects.

How do I share my SPSS Modeler flow with another project?

By design, modeler flows can be used only in the project where the flow is created or imported. If you need to use a modeler flow in a different project, you must download the flow from current project (source project) to your local environment and then import the flow to another project (target project).

IBM Watson Machine Learning

How do I run an AutoAI experiment?

Go to Creating an AutoAI experiment from sample data to watch a short video to see how to create and run an AutoAI experiment and then follow a tutorial to set up your own sample.

What is available for automated model building?

The AutoAI graphical tool automatically analyzes your data and generates candidate model pipelines that are customized for your predictive modeling problem.  These model pipelines are created iteratively as AutoAI analyzes your data set and discovers data transformations, algorithms, and parameter settings that work best for your problem setting.  Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective. For details, see AutoAI overview.

What frameworks and libraries are supported for my machine learning models?

You can use popular tools, libraries, and frameworks to train and deploy machine learning models by using IBM Watson Machine Learning. The supported frameworks topic lists supported versions and features, as well as deprecated versions scheduled to be discontinued.

What is an API Key?

API keys allow you to easily authenticate when using the CLI or APIs that can be used across multiple services. API Keys are considered confidential since they are used to grant access. Treat all API keys as you would a password since anyone with your API key can impersonate your service.

Watson OpenScale

What is Watson OpenScale

IBM Watson OpenScale tracks and measures outcomes from your AI models, and helps ensure they remain fair, explainable, and compliant wherever your models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production

How is Watson OpenScale priced?

The Standard pricing plan charges a flat fee per model, with no restrictions on the number of payload, feedback rows, or transactions for Explainability. The up-to-date information is available in the IBM Cloud catalog.

Is there a free trial for Watson OpenScale?

Watson OpenScale offers a free trial plan. To sign up, see Watson OpenScale web page and click Get started now. You can use the free plan if you want (subject to monthly usage limits that refresh every month).

Is Watson OpenScale available on IBM Cloud Pak for Data?

Watson OpenScale is one of the included services for IBM Cloud Pak for Data.

How do I convert a prediction column from an integer data type to a categorical data type?

For fairness monitoring, the prediction column allows only an integer numerical value even though the prediction label is categorical. How do I configure a categorical feature that is not an integer? Is a manual conversion required?

The training data might have class labels such as “Loan Denied”, “Loan Granted”. The prediction value that is returned by IBM Watson Machine Learning scoring end point has values such as “0.0”, “1.0". The scoring end point also has an optional column that contains the text representation of prediction. For example, if prediction=1.0, the predictionLabel column might have a value “Loan Granted”. If such a column is available, when you configure the favorable and unfavorable outcome for the model, specify the string values “Loan Granted” and “Loan Denied”. If such a column is not available, then you need to specify the integer and double values of 1.0, 0.0 for the favorable, and unfavorable classes.

IBM Watson Machine Learning has a concept of output schema that defines the schema of the output of IBM Watson Machine Learning scoring end point and the role for the different columns. The roles are used to identify which column contains the prediction value, which column contains the prediction probability, and the class label value, and so on. The output schema is automatically set for models that are created by using model builder. It can also be set by using the IBM Watson Machine Learning Python client. Users can use the output schema to define a column that contains the string representation of the prediction. Set the modeling_role for the column to ‘decoded-target’. Read the [documentation for the Watson Machine Learning Python client library. Search for “OUTPUT_DATA_SCHEMA” to understand the output schema and the API to use is to store_model API that accepts the OUTPUT_DATA_SCHEMA as a parameter.

Why does Watson OpenScale need access to training data?

You must either provide Watson OpenScale access to training data that is stored in Db2 or IBM Cloud Object Storage, or you must run a Notebook to access the training data.

Watson OpenScale needs access to your training data for the following reasons:

  • To generate contrastive explanations: To create explanations, access to statistics, such as median value, standard deviation, and distinct values from the training data is required.
  • To display training data statistics: To populate the bias details page, Watson OpenScale must have training data from which to generate statistics.
  • To build a drift detection model: The Drift monitor uses training data to create and calibrate drift detection.

In the Notebook-based approach, you are expected to upload the statistics and other information when you configure a deployment in Watson OpenScale. Watson OpenScale no longer has access to the training data outside of the Notebook, which is run in your environment. It has access only to the information uploaded during the configuration.

What does it mean if the fairness score is greater than 100 percent?

Depending on your fairness configuration, your fairness score can exceed 100 percent. It means that your monitored group is getting relatively more “fair” outcomes as compared to the reference group. Technically, it means that the model is unfair in the opposite direction.

How is model bias mitigated by using Watson OpenScale?

The debiasing capability in Watson OpenScale is enterprise grade. It is robust, scalable and can handle a wide variety of models. Debiasing in Watson OpenScale consists of a two-step process: Learning Phase: Learning customer model behavior to understand when it acts in a biased manner.

Application Phase: Identifying whether the customer’s model acts in a biased manner on a specific data point and, if needed, fixing the bias. For more information, see Debiasing options.

Is it possible to check for model bias on sensitive attributes, such as race and sex, even when the model is not trained on them?

Yes. Recently, Watson OpenScale delivered a ground-breaking feature called “Indirect Bias detection.” Use it to detect whether the model is exhibiting bias indirectly for sensitive attributes, even though the model is not trained on these attributes.

Is it possible to mitigate bias for regression-based models?

Yes. You can use Watson OpenScale to mitigate bias on regression-based models. No additional configuration is needed from you to use this feature. Bias mitigation for regression models is done out-of-box when the model exhibits bias.

What are the different methods of debiasing in Watson OpenScale?

You can use both Active Debiasing and Passive Debiasing for debiasing. For more information, see Debiasing options.

Configuring a model requires information about the location of the training data and the options are Cloud Object Storage and Db2. If the data is in Netezza, can Watson OpenScale use Netezza?

Use this Watson OpenScale Notebook to read the data from Netezza and generate the training statistics and also the drift detection model.

Why doesn't Watson OpenScale see the updates that were made to the model?

Watson OpenScale works on a deployment of a model, not on the model itself. You must create a new deployment and then configure this new deployment as a new subscription in Watson OpenScale. With this arrangement, you are able to compare the two versions of the model.

What are the various kinds of risks associated in using a machine learning model?

Multiple kinds of risks that are associated with machine learning models, such as any change in input data that is also known as Drift can cause the model to make inaccurate decisions, impacting business predictions. Training data can be cleaned to be free from bias but runtime data might induce biased behavior of model.

Traditional statistical models are simpler to interpret and explain, but unable to explain the outcome of the machine learning model can pose a serious threat to the usage of the model.

For more information, see Manage model risk .

Must I keep monitoring the Watson OpenScale dashboard to make sure that my models behave as expected?

No, you can set up email alerts for your production model deployments in Watson OpenScale. You receive email alerts whenever a risk evaluation test fails, and then you can come and check the issues and address them.

In Watson OpenScale, what data is used for Quality metrics computation?

Quality metrics are calculated that use manually labeled feedback data and monitored deployment responses for this data.

In Watson OpenScale, can the threshold be set for a metric other than 'Area under ROC' during configuration?

No, currently, the threshold can be set only for the 'Area under ROC' metric.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more