Frequently asked questions

Cloud Pak for Data as a Service

Accounts and setup



IBM Cloud Object Storage

IBM Watson Knowledge Catalog


Security and reliability

Sharing and collaboration

IBM Watson Machine Learning

Watson OpenScale

Cloud Pak for Data as a Service

What is Cloud Pak for Data as a Service?

Cloud Pak for Data as a Service provides a single, unified interface for a set of core IBM Cloud services and their related services. The core services are Watson Studio, Watson Machine Learning, and Watson Knowledge Catalog. You can add other services to supplement Watson Studio, store your data, or develop Watson applications.

See Overview of Cloud Pak for Data as a Service.

Why did my product name change to Cloud Pak for Data as a Service?

Your product name changed to Cloud Pak for Data as a Service because you have Watson Studio, Watson Machine Learning, or Watson Knowledge Catalog, plus another service in the Cloud Pak for Data as a Service services catalog, such as Cognos Dashboards Embedded. The features, plans, and costs of your services did not change.

See Relationships between the Watson Studio and Watson Knowledge Catalog services and Cloud Pak for Data as a Service.

What’s the difference between Watson Studio and Cloud Pak for Data as a Service?

Watson Studio is a single service, while Cloud Pak for Data as a Service is a set of services, which includes Watson Studio as one of its core services. The features of Watson Studio are the same in both cases.

See Overview of Cloud Pak for Data as a Service.

What’s the difference between Cloud Pak for Data 3.5 and Cloud Pak for Data as a Service?

Cloud Pak for Data 3.5 is software that you must install and maintain, while Cloud Pak for Data as a Service is a set of IBM Cloud services that are fully managed by IBM.

See Comparison between Cloud Pak for Data deployments.

How do I sign up for Cloud Pak for Data as a Service?

Go to

For the URLs in other regions, see Which regions can I provision Cloud Pak for Data as a Service in?.

Can I try Cloud Pak for Data as a Service for free?

Yes, Cloud Pak for Data as a Service has a Starter plan that’s free. Go to

Does Cloud Pak for Data as a Service have a subscription plan?

Yes, Cloud Pak for Data as a Service has a subscription plan. See Upgrading to a Cloud Pak for Data as a Service subscription account.

Accounts and setup

How do I start Watson Studio from the console?

After you log in to the console, select Watson Studio from your Resources list, then click the Get Started button.

How do I create a Watson Studio Lite instance?

You can provision one Watson Studio Lite instance per account. See Signing up for Watson Studio. If you’re ready to provision, go to the Watson Studio catalog page to create an instance.

Why is the Create button disabled when I try to add the Watson Studio service?

If you have a Lite plan, you can create only one instance of the service. The button is dimmed or disabled if you have an existing instance in your account. To verify, check the Resource list in the console for an existing Services entry.

How can I get the IAM Editor role so I can provision service instances?

If you try to provision an instance of a service, for example, the Visual Recognition service, and you might get this error message:

You do not have the required permission to create an instance. You must be assigned the IAM Editor role or Operator role or higher. Contact the account owner to update your access.

To get the IAM Editor role:

  1. Find your IBM Cloud account owner or administrator.
  2. Ask to be assigned the IAM Editor role for the resource group.

Which regions can I provision Cloud Pak for Data as a Service in?

Currently, you can provision Cloud Pak for Data as a Service and its core services, Watson Studio, Watson Machine Learning, and Watson Knowledge Catalog, in these IBM Cloud regions:

See IBM Cloud overview: Regions.

You can provision other services to use with Watson Studio in any region. See Add and manage services.

Why can’t I see all my projects and catalogs across regions?

For some offering plans, you can provision Watson Studio and Watson Knowledge Catalog services in multiple IBM Cloud service regions. However, your projects, catalogs, and data are specific to the region in which they were saved and can be accessed only from your services in that region. You must switch your region to see the projects, catalogs, and data from that region.

Which web browsers can I use with Cloud Pak for Data as a Service?

You can use the latest versions of these web browers:

  • Chrome
  • Safari
  • Firefox
    Tip for Firefox on Mac users: Horizontal scrolling within the UI can be interpreted by your Mac as an attempt to swipe between pages. If this behavior is undesired or if you experience browser crashes after the service prompts you to stay on the page, consider disabling the Swipe between pages gesture in Launchpad > System Preferences > Trackpad > More Gestures.
  • Firefox ESR

How do I upgrade?

When you’re ready to upgrade Cloud Pak for Data as a Service or any of the services that you created in Cloud Pak for Data as a Service, you can upgrade in place without losing any of your work or data.

You must be the owner or administrator of the IBM Cloud account for a service to upgrade it.

How do I find my IBM Cloud account owner?

If you have an enterprise account or work in an IBM Cloud that you don’t own, you might need to ask an account owner to give you the Watson Knowledge Catalog service Admin role or the IBM Cloud account administrator role.

To find your IBM Cloud account owner:

  1. From Cloud Pak for Data as a Service, choose Administer > Account > Users. From IBM Cloud, choose Manage > Account > Users.
  2. From the avatar menu, make sure you’re in the right account, or switch accounts, if necessary.
  3. On the Users page, find the user name with the word owner next to it.

To understand roles, see Roles for Cloud Pak for Data as a Service. To determine your roles, see Determine your roles.


Deprecation of Python 2.7 and 3.5

Python 2.7 and 3.5 are being deprecated and will no longer be available after August 28, 2019. The default version of Python in Watson Studio is now 3.6. When you switch from Python 3.5 or 2.7 to Python 3.6, you might need to update your code if the versions of open source libraries that you use are different in Python 3.6. See Changing the environment.

Read this blog post: Python version upgrade in Watson Studio Cloud

Deprecation of Apache Spark Lite service

You can no longer associate an Apache Spark Lite service with a project. Apache Spark Lite services will be deleted on June 28, 2019. Read this blog post: Deprecation of Apache Spark (Lite Plan).

If you currently use the Apache Spark as a Service for any of the following ways, you must switch to using built-in Spark environments:  

  • Batch Deployments - Use built-in Spark environments instead. 
  • Model Builder - Use built-in Spark environments instead.
  • Modeler Flows with Spark Runtime - Use built-in Spark environments instead.
  • Notebooks - Use built-in Spark environments instead.
  • Realtime streaming prediction - Deployments using Apache Spark as a Service will no longer work after June 28, 2019.

To learn more about processing and runtime usage costs, see Monitoring account resource usage

Removal of SparkML modeler and Neural Network Modeler

The beta Neural Network Modeler and the beta SparkML modeler tools are removed from Watson Studio on 31 July 2020.


Where do I start a new project for Watson Studio?

Log in at to go to your home page, then click New Project. Watch the video about creating a project to see how to create both a blank project and a project from a file.

What is the difference between using Spark in a Spark environment and Spark in IBM Analytics Engine?

Spark environments are provided under Watson Studio. A Spark environment offers Spark kernels as a service (SparkR, PySpark and Scala) and is based on Armada/Kubernetes. The underlying Armada is shared across multiple users. However each kernel gets a dedicated Spark cluster and Spark executors. You can change the Spark configurations, and can specify the size of the executors and the number of executors per kernel. A Spark environment is more serverless in nature.

In contrast, IBM Analytics Engine offers Hortonworks Data Platform on IBM Cloud. You get one VM per cluster node and your own local HDFS. You get Spark and the entire Hadoop ecosystem. You are given shell access and can also create notebooks.

Is environment runtime sharing and billing the same for Spark environments as for Anaconda environments?

No. Compute, data resources, and billing can’t be shared in a Spark environment.

Whereas you can open a notebook with an Anaconda environment, stop the kernel of the notebook, then start a second notebook with the same environment and share the runtime without stopping it, you cannot do this with a Spark environment.

Spark environments runtimes can’t be shared. Every notebook kernel has its own dedicated Spark cluster. If you create two notebooks using the same environment definition, two runtimes, each with their own kernel are started, which means that two clusters, each with a set of spark executors are created.

How do I load very large files to my project?

You can’t load data files larger than 5 GB to your project from Watson Studio. If your files are larger, you must use the Cloud Object Storage API and load the data in multiple parts. See the curl commands for working with Cloud Object Storage directly on IBM Cloud.

Why is the machine learning job I submitted still in pending state?

The machine learning job you submitted is still in the “Pending” state because it is waiting for enough resources to start running. This can happen if resources are currently in high demand or if you submitted a large number of concurrent requests and your newer requests are waiting for the ones submitted earlier to complete execution.

How do I choose which tool to use?

The tool you need depends on your type of data, what you want to do with your data, and how much automation you want. To find the right tool, see Choosing a tool.

Why does my newly created IBM Analytics Engine service not have a resource key?

When you create an IBM Analytics Engine service from Watson Studio and try to associate the service with your Watson Studio project, a message appears telling you that the selected Analytics Engine service doesn’t have a resource key.

Follow these steps to create a resource key to enable associating an Analytics Engine service with a project:

  1. Create a wdp-writer service credential in IBM Cloud for your Analytics Engine service:

    1. Select your Analytics Engine service from the resource list on your dashboard in IBM Cloud.
    2. Click Service credentials and then New credential. Name the new credential wdp-writer, give it Writer role, and click Add.
  2. Reset the cluster password by clicking Manage and Reset under cluster credentials. Copy the displayed password and save it somewhere. The user name is clsadmin by default.
  3. Associate the Analytics Engine service with your Watson Studio project. Click the Settings tab of your project and Add service under the associated services section.
  4. Select your service on the Existing tab and provide the user name and password.

Now you can select the service in your project, for example to run a notebook.

Note: If an admin resets the cluster password, you will need to delete the associated service from all the projects, reset the cluster password, and then re-associate the service.

IBM Cloud Object Storage

What is saved in IBM Cloud Object Storage for projects and catalogs?

When you create a project or catalog, you specify a IBM Cloud Object Storage and create a bucket that is dedicated to that project or catalog. These types of objects are stored in the IBM Cloud Object Storage bucket for the project or catalog:

  • Files for data assets that you uploaded into the project or catalog.
  • Files associated with analytic assets, such as, notebooks, dashboards, and models.
  • Metadata about assets, such as, the asset description, tags, and comments or reviews.

Do I need to upgrade IBM Cloud Object Storage when I upgrade core services?

You must upgrade your IBM Cloud Object Storage instance only when you run out of storage space. Core services can use any IBM Cloud Object Storage plan and you can upgrade any core service or your IBM Cloud Object Storage service independently.

Why am I unable to add storage to an existing project or to see the IBM Cloud Object Storage selection in the New Project dialog?

IBM Cloud Object Storage requires an extra step for users who do not have administrative privileges for it. The account administrator must enable non-administrative users to create projects.

If you have administrator privileges and do not see the latest IBM Cloud Object Storage, try again later because server-side caching might cause a delay in rendering the latest values.

Watson Knowledge Catalog

What is Watson Knowledge Catalog?

Watson Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including structured and unstructured data wherever they reside, so that they can be easily accessed and used to fuel data science and AI. For selected source types, Watson Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards, and business analysts to find, understand, share, and use the assets. AI-powered search and recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used, and social connections between users.

Watson Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to ensure that the right data go to the right people.

Through Watson Knowledge Catalog’s business glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.

What is the difference between a catalog and a project?

A catalog is where you share assets across the enterprise. A project is where you work with assets within smaller teams. An enterprise catalog can have thousands of assets shared with hundreds of users. Projects are designed for a team of collaborators to work with a small number of assets for a specific goal, such as developing an artificial intelligence model or data preparation, using Watson Studio.

What data sources and asset types are supported?

Watson Knowledge Catalog supports over 30 connectors to cloud or on premises data source types. See Connection types.

Watson Knowledge Catalog also supports other asset types, such as structured data, unstructured data, models, and notebooks.

How do I load very large files to my catalog?

You can’t load data files larger than 5 GB to your catalog from Watson Knowledge Catalog. To add a file that is larger than 5 GB to a catalog, upload the file to IBM Cloud Object Storage and then add it as a connected data asset.

Do I need to move my data into Watson Knowledge Catalog?

No, you can keep all your data in their existing repositories or you can upload local files to the IBM Cloud Object Storage associated with the catalog. The choice is yours.

Watson Knowledge Catalog stores and manages only the metadata of your assets.

What is the maximum number of assets I can have in Watson Knowledge Catalog?

The number of assets you can have in Watson Knowledge Catalog depends on your plan:

  • Lite plan: 50 assets
  • Standard plan: 500 assets
  • Professional plan: unlimited assets

See Watson Knowledge Catalog offering plans.

Does Watson Knowledge Catalog provide policy services?

Watson Knowledge Catalog includes an automated policy enforcement engine that will determine outcomes based upon the policies and the action taken place. Watson Knowledge Catalog provides the ability to set up your policies within the system and allow you to restrict access to data based upon the defined policies.

Does Watson Knowledge Catalog provide classification services?

For governed catalogs that are created with policy enforcement, Watson Knowledge Catalog automatically classifies the columns in your relational data assets when they are added to the catalog. Over 160 data classes for columns are provided, including names, emails, postal addresses, credit card numbers, driver’s licenses, government identification numbers, date of birth, demographic information, DUNS number, and more. For ungoverned catalogs that do not enforce policies, a user can choose to classify, or profile, a relational data asset, but assets are not automatically classified. Catalogs also profile unstructured data assets and extract metadata from content, such as, categories, concepts, sentiment, and emotion. See Profile assets.

Does Watson Knowledge Catalog have data wrangling capabilities?

Yes, data preparation capabilities are available in Data Refinery, which is part of Watson Knowledge Catalog. Data Refinery provides a rich set of capabilities that not only allow you to discover, cleanse, and transform your data with built-in operations, but it also comes with powerful profiling and visualization tools such as charts and graphs to help you interact with and understand your data.

Data access and transform policies defined in Watson Knowledge Catalog are also enforced in Data Refinery to ensure that sensitive data that originated from governed catalogs remain protected.

Can I set up access groups for people in different lines of business and roles?

You can set up access groups through your IBM Cloud account in the Identity and Asset Management (IAM) area.

After you set up the access groups, on the Access Control page of a catalog, you can add the access group so that all members of the access group can access the catalog with the same permissions. See Add access groups.

Does Watson Knowledge Catalog use Apache Atlas for its metadata repository?

Watson Knowledge Catalog uses its own local store for metadata.

Watson Knowledge Catalog runs on a cloud native persistence store that can meet the platform needs for performance, up-time, and scalability.

When adding assets to a project from a catalog, why don’t I see all my projects listed in the dropdown for target project?

When adding assets from catalog or project, or publishing assets from project to catalog, both project and catalog must satisfy criteria:

  • You must be a member of the same Cloud Pak for Data as a Service account in IBM Cloud as the catalog owner, or, if your company set up SAML federation on IBM Cloud, you must be in the same company as the catalog owner.
  • If you want to add catalog assets to the project, you must choose to restrict who can be a collaborator in the project. If you just want to publish assets to a catalog, you don’t need to restrict the project.
  • You must choose IBM Cloud Object Storage when you create a project. You must the owner of the IBM Cloud Object Storage instance or the IBM Cloud Object Storage instance must be configured to allow project creation.

In the catalog screen, the dropdown for target project when adding assets to project lists only the projects that satisfy all thses criteria.

When I create policies, which catalogs will they apply to?

Policies are scoped to the IBM Cloud account and will be enforced on assets in all catalogs with policies enforced that belong to the same IBM Cloud account as the policies.

Do policies affect data in external data sources?

No, Watson Knowledge Catalog is a data catalog for searching for data.

Policies affect only how data appears within the catalog. Policies do not affect users who access external data sources directly.

Why can’t I add policies?

Only users with Watson Knowledge Catalog service Admin role can add policies, business terms, catalogs and view the data dashboard. If you do not have access to the UI control to add policies, then you are assigned the default Watson Knowledge Catalog service Viewer role, which limits you to only viewing exisitng policies and business terms. Ask your Cloud Pak for Data as a Service administrator to give you the Admin role for the Watson Knowledge Catalog service.


Can I install libraries or packages to use in my notebooks?

You can install Python and Scala libraries and R packages through a notebook, and those libraries and packages will be available to all your notebooks that use the same environment definition. For instructions, see Import custom or third-party libraries. If you get an error about missing operating system dependencies when you install a library or package, notify IBM by clicking the chat icon. To see the preinstalled libraries and packages and the libraries and packages that you installed, from within a notebook, run the appropriate command:

  • Python: !pip list
  • R: installed.packages()
  • Scala: Click the Notebook info icon and then click Environment.

Can I call functions defined in one notebook from another notebook?

No, there is no way to call one notebook from another notebook in Watson Studio. However, you can put your common code into a library outside of Watson Studio and then install it.

Can I add arbitrary notebook extensions?

No, you can’t extend your notebook capabilities by adding arbitrary extensions as a customization because all notebook extensions must be preinstalled. The only notebook extension which is preinstalled is the Esri ArcGIS extension, which you can select when you create a runtime environment definition and select the Python 3.5 software configuration. This selection enables widgetsnbextension for ipywidgets.

How do I access the data from a CSV file in a notebook?

After you load a CSV file into object storage, choose one of the options to create a DataFrame or other data structure from the Insert to code menu under the file name. For instructions, see Load and access data.

How do I access the data from a compressed file in a notebook?

After you load the compressed file to object storage, get the file credentials by using the Insert to code menu under the file name. Then use this function to save the file from object storage in GPFS. The credentials argument is the dictionary that was inserted to code in your notebook.

Security and reliability

How secure is Cloud Pak for Data as a Service?

Cloud Pak for Data as a Service are very secure and resilient. See Security of Cloud Pak for Data as a Service.

Are my data and notebook protected from sharing outside of my collaborators?

The data that is loaded into your Spark service and notebooks is secure. Only the collaborators in your project can access your data or notebooks. Each Watson Studio account acts as a separate tenant of the Spark and Object Storage services. Tenants cannot access other tenant’s data.

If you want to share your notebook with the public, then hide your data service credentials in your notebook. For the Python, R, and Scala languages, enter the following syntax: # @hidden_cell

Be sure to save your notebook immediately after you enter the syntax to hide cells with sensitive data.

Only then should you share your work.

Do I need to back up my notebooks?

No. Your notebooks are stored in IBM Cloud Object Storage, which provides resiligency in case of an outage.

Sharing and collaboration

What are the implications of sharing a notebook?

When you share a notebook, the permalink never changes. Any person with the link can view your notebook. You can unshare the notebook by clearing the check box to share it. Updates are not automatically shared. When you update your notebook, you can sync the shared notebook by reselecting the check box to share it.

How can I share my work outside of RStudio in Watson Studio?

One way of sharing your work outside of RStudio in Watson Studio is connecting it to a shared GitHub repository that you and your collaborators can work from. Read this blog post for more information.

However, the best method to share your work with the members of a project in Watson Studio is to use notebooks in the project using the R kernel.

RStudio is a great environment to work in for prototyping and working individually on R projects, but it is not yet integrated with Watson Studio projects.

How do I share my SPSS Modeler flow with another project?

By design, modeler flows can only be used in the project where the flow is created or imported. If you need to use a modeler flow in a different project, you must download the flow from current project (source project) to your local environment and then import the flow to another project (target project).

IBM Watson Machine Learning

How do I run an AutoAI experiment?

Go to Creating an AutoAI experiment from sample data to watch a short video to see how to create and run an AutoAI experiment and then follow a tutorial to set up your own sample.

What is available for automated model building?

The AutoAI graphical tool in Watson Studio automatically analyzes your data and generates candidate model pipelines customized for your predictive modeling problem.  These model pipelines are created iteratively as AutoAI analyzes your dataset and discovers data transformations, algorithms, and parameter settings that work best for your problem setting.  Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective. For details, see AutoAI overview.

What frameworks and libraries are supported for my machine learning models?

You can use popular tools, libraries, and frameworks to train and deploy machine learning models using IBM Watson Machine Learning. The supported frameworks topic lists supported versions and features, as well as deprecated versions scheduled to be discontinued.

What is an API Key?

API keys allow you to easily authenticate when using the CLI or APIs that can be used across multiple services. API Keys are considered confidential since they are used to grant access. Treat all API keys as you would a password since anyone with your API key can impersonate your service.

Can I provide feedback?

Yes, we encourage feedback as we continue to develop this exciting array of services. Click the chat icon, type a comment, and press Return.

Watson OpenScale

What is Watson OpenScale

IBM Watson OpenScale tracks and measures outcomes from your AI models, and helps ensure they remain fair, explainable, and compliant wherever your models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production

How is Watson OpenScale priced?

There’s a Standard pricing plan that includes monitoring of up to 24 deployed models, with no restrictions on the number of payload, feedback rows, or transactions for Explainability. The up-to-date information is available in the IBM Cloud catalog.

Is there a free trial for Watson OpenScale?

Watson OpenScale offers a free trial plan. To sign up, see Watson OpenScale web page and click Get started now. You can use the free plan if you want (subject to monthly usage limits that refresh every month).

Does Watson OpenScale work with Microsoft Azure ML engine?

Watson OpenScale supports both Microsoft Azure ML Studio and Microsoft Azure ML Service engines. For more information, see Microsoft Azure ML Studio frameworks and Microsoft Azure ML Service frameworks.

Does Watson OpenScale work with Amazon SageMaker ML engine?

Watson OpenScale supports Amazon SageMaker ML engine. For more information, see Amazon SageMaker frameworks.

Is Watson OpenScale available on IBM Cloud Pak for Data?

Watson OpenScale is one of the included services for IBM Cloud Pak for Data.

To run Watson OpenScale on my own servers, how much computer processing power is required?

There are specific guidelines for hardware configuration for three-node and six-node configurations. Your IBM Technical Sales team can also help you with sizing your specific configuration. Because Watson OpenScale run as an add-on to IBM Cloud Pak for Data, you need to consider the requirements for both software products.

How do I convert a prediction column from an integer data type to a categorical data type?

For fairness monitoring, the prediction column allows only an integer numerical value even though the prediction label is categorical. How do I configure a categorical feature that is not an integer? Is a manual conversion required?

The training data might have class labels such as “Loan Denied”, “Loan Granted”. The prediction value that is returned by IBM Watson Machine Learning scoring end point has values such as “0.0”, “1.0”. The scoring end point also has an optional column that contains the text representation of prediction. For example, if prediction=1.0, the predictionLabel column might have a value “Loan Granted”. If such a column is available, when you configure the favorable and unfavorable outcome for the model, specify the string values “Loan Granted” and “Loan Denied”. If such a column is not available, then you need to specify the integer and double values of 1.0, 0.0 for the favorable, and unfavorable classes.

IBM Watson Machine Learning has a concept of output schema that defines the schema of the output of IBM Watson Machine Learning scoring end point and the role for the different columns. The roles are used to identify which column contains the prediction value, which column contains the prediction probability, and the class label value, etc. The output schema is automatically set for models that are created by using model builder. It can also be set by using the IBM Watson Machine Learning Python client. Users can use the output schema to define a column that contains the string representation of the prediction. Set the modeling_role for the column to ‘decoded-target’. The documentation for the IBM Watson Machine Learning Python client is available at: Search for “OUTPUT_DATA_SCHEMA” to understand the output schema and the API to use is to store_model API that accepts the OUTPUT_DATA_SCHEMA as a parameter.

Why does Watson OpenScale need access to training data?

You must either provide Watson OpenScale access to training data that is stored in Db2 or IBM Cloud Object Storage, or you must run a Notebook to access the training data.

Watson OpenScale needs access to your training data for the following reasons:

  • To generate contrastive explanations: To create explanations, access to statistics, such as median value, standard deviation, and distinct values from the training data is required.
  • To display training data statistics: To populate the bias details page, Watson OpenScale must have training data from which to generate statistics.
  • To build a drift detection model: The Drift monitor uses training data to create and calibrate drift detection.

In the Notebook-based approach, you are expected to upload the statistics and other information when you configure a deployment in Watson OpenScale. Watson OpenScale no longer has access to the training data outside of the Notebook, which is run in your environment. It has access only to the information uploaded during the configuration.

What internet browser does Watson OpenScale support?

The Watson OpenScale service requires the same level of browser software as is required by IBM Cloud. See the IBM Cloud Prerequisites topic for details.

Is there a command-line tool to use?

Yes! There is a ModelOps CLI tool, whose official name is the Watson OpenScale CLI model operations tool. Use it to run tasks related to the lifecycle management of machine learning models. This tool is complementary to the IBM Cloud CLI tool, augmented with the machine learning plug-in.

What version of Python can I use with Watson OpenScale?

Because Watson OpenScale is independent of your model-creation process, it supports whatever Python versions your machine learning provider supports. The Watson OpenScale Python client is a Python library that works directly with the Watson OpenScale service on IBM Cloud. For the most up-to-date version information, see the Requirements section. You can use the Python client, instead of the Watson OpenScale client UI, to directly configure a logging database, bind your machine learning engine, and select and monitor deployments. For examples of using the Python client in this way, see the Watson OpenScale sample Notebooks.

What does it mean if the fairness score is greater than 100 percent?

Depending on your fairness configuration, your fairness score can exceed 100 percent. It means that your monitored group is getting relatively more “fair” outcomes as compared to the reference group. Technically, it means that the model is unfair in the opposite direction.

Configuring a model requires information about the location of the training data and the options are Cloud Object Storage and Db2. If the data is in Netezza, can Watson OpenScale use Netezza?

Use this Watson OpenScale Notebook to read the data from Netezza and generate the training statistics and also the drift detection model.

Why doesn’t Watson OpenScale see the updates that were made to the model?

Watson OpenScale works on a deployment of a model, not on the model itself. You must create a new deployment and then configure this new deployment as a new subscription in Watson OpenScale. With this arrangement, you are able to compare the two versions of the model.

How is fairness calculated in Watson OpenScale?

In Watson OpenScale, fairness is calculated by using disparate impact ratio and by perturbing monitored groups and reference groups. For more information, see Fairness metrics overview.

How is model bias mitigated by using Watson OpenScale?

The debiasing capability in Watson OpenScale is enterprise grade. It is robust, scalable and can handle a wide variety of models. Debiasing in Watson OpenScale consists of a two-step process: Learning Phase: Learning customer model behavior to understand when it acts in a biased manner.

Application Phase: Identifying whether the customer’s model acts in a biased manner on a specific data point and, if needed, fixing the bias. For more information, see Understanding how debiasing works and Debiasing options.

Is it possible to check for model bias on sensitive attributes, such as race and sex, even when the model is not trained on them?

Yes. Recently, Watson OpenScale delivered a ground-breaking feature called “Indirect Bias detection.” Use it to detect whether the model is exhibiting bias indirectly for sensitive attributes, even though the model is not trained on these attributes. For more information, see Understanding how debiasing works.

Is it possible to mitigate bias for regression-based models?

Yes. You can use Watson OpenScale to mitigate bias on regression-based models. No additional configuration is needed from you to use this feature. Bias mitigation for regression models is done out-of-box when the model exhibits bias.

What are the different methods of debiasing in Watson OpenScale?

You can use both Active Debiasing and Passive Debiasing for debiasing. For more information, see Debiasing options.

Can I configure model fairness through an API?

Yes, it is possible with the Watson OpenScale SDK. For more information, see IBM Watson OpenScale Python SDK documentation!.

What are various model frameworks supported in Watson OpenScale?

For the list of supported machine learning engines, frameworks, and models see the Watson OpenScale documentation Supported machine learning engines, frameworks, and models.

What are the supported machine learning providers for Watson OpenScale?

For the list of supported machine learning engines, frameworks, and models see the Watson OpenScale documentation Supported machine learning engines, frameworks, and models.

What are the various kinds of risks associated in using a machine learning model?

Multiple kinds of risks that are associated with machine learning models, such as any change in input data that is also known as Drift can cause the model to make inaccurate decisions, impacting business predictions. Training data can be cleaned to be free from bias but runtime data might induce biased behavior of model.

Traditional statistical models are simpler to interpret and explain, but unable to explain the outcome of the machine learning model can pose a serious threat to the usage of the model.

For more information, see Manage model risk .

What are the monitors that are available in Watson OpenScale?

In Watson OpenScale the machine learning models can be monitored for fairness, quality, drift (both model and data drift), and be able explain the transactions. And along with these Watson OpenScale provides the capability to plug-in custom monitors that customers can develop and hook it with Watson OpenScale.

Does Watson OpenScale detect drift in accuracy and drift in data?

Watson OpenScale detects both drift in accuracy and drift in data:

  • Drift in accuracy estimates the drop in accuracy of the model at run time. Model accuracy drops when there is an increase in transactions that are similar to those that the model did not evaluate correctly in the training data.
  • This type of drift is calculated for structured binary and multi-class classification models only. Whereas, drift in data estimates the drop in consistency of the data at runtime as compared to the characteristics of the data at training time.

What are the types of explanations shown in Watson OpenScale?

Watson OpenScale provides two types of explanations - Local explanation based on LIME, and Contrastive explanation. For more information, see Understanding the difference between contrastive explanations and LIME.

How do I infer from Local/LIME explanation from Watson OpenScale?

In in Watson OpenScale, LIME reveals which features played most important role in the model prediction for a specific data point. Along with the features their relative importance is also shown.

How do I infer contrastive explanation from Watson OpenScale?

Contrastive explanation in Watson OpenScale shows the minimum change to be made to the input data point that would give a different model prediction than the input data point.

What is what-if analysis in Watson OpenScale?

The explanations UI also provides ability to test what-if scenarios, where in the user can change the feature values of the input datapoint and check its impact on the model prediction and probability.

In Watson OpenScale, for which models is Local/LIME explanation supported?

Local explanation is supported for models that use structured data and of problem type regression and classification and models that use unstructured text, unstructured image data and problem type classification.

In Watson OpenScale, for which models is contrastive explanation and what-if analysis supported?

Contrastive explanations and what-if analyses are supported for models that use structured data and are of problem type classification only.

What are controllable features in Watson OpenScale explainability configuration?

Using controllable features some features of the input data point can be locked, so that they do not change when the contrastive explanation is generated and also they cannot be changed in what if analysis. The features that should not be changed should be set as non-controllable or NO in the explainability configuration.

While configuring the machine learning providers in Watson OpenScale, what is the difference between pre-production and production subscriptions?

Before you want to put the model for production usage, a model validator would like to configure and validate the model in a pre-production service provider. And this exactly what Watson OpenScale provides whereby you can configure a machine learning provider as pre-production perform all the risk evaluations and once the model evaluation is per the quality standards, then put that model for production usage.

Must I keep monitoring the Watson OpenScale dashboard to make sure that my models behave as expected?

No, you can set up email alerts for your production model deployments in Watson OpenScale, so that you receive email alerts whenever a risk evaluation test fails, and then you can come and check the issues and address them.

How are IBM OpenPages and Watson OpenScale related in the overall model risk management arena?

IBM offers an end-to-end model risk management solution with IBM Watson OpenScale and IBM OpenPages with Watson. IBM OpenPages MRG offers model risk governance to store and manage a comprehensive model inventory. IBM Watson OpenScale monitors and measures outcomes from AI Models across its lifecycle and validates models.

For more information, see Configure model governance with IBM OpenPages MRG .

In a pre-production environment, that uses Watson OpenScale after the model is evaluated for risk and approved for usage, do I must reconfigure all the monitors again in production environment?

No, Watson OpenScale provides a way to copy the configuration of pre-production subscription to production subscription.

In Watson OpenScale, can I compare my model deployments in pre-production with a benchmark model to see how good or bad it is?

Yes, Watson OpenScale provides you with the option to compare two model deployments or subscriptions where you can see a side-by-side comparison of the behavior of the two models on each of the monitors configured. To compare go to the model summary page on Watson OpenScale dashboard and select Actions -> Compare.

Which Quality metrics are supported by Watson OpenScale?

Watson OpenScale supports ‘Area under ROC’, ‘Area under Precision-Recall (PR)’, ‘Proportion explained variance’, ‘Mean absolute error’, ‘Mean squared error’, ‘R squared’, ‘Root of mean squared error’, ‘Accuracy’, ‘Weighted True Positive Rate’, ‘True positive rate’, ‘Weighted False Positive Rate’, ‘False positive rate’, ‘Weighted recall’, ‘Recall’, ‘Weighted precision’, ‘Precision’, ‘Weighted F1-Measure’, ‘F1-Measure’, ‘Logarithmic loss’.

Where can I find more information about the respective quality metrics that are monitored by Watson OpenScale?

You can find more about the metrics here: Supported quality metrics.

In Watson OpenScale, what data is used for Quality metrics computation?

Quality metrics are calculated that use manually labeled feedback data and monitored deployment responses for this data.

In Watson OpenScale, can the threshold be set for a metric other than ‘Area under ROC’ during configuration?

No, currently, the threshold can be set only for the ‘Area under ROC’ metric.

In Watson OpenScale, why are some of the configuration tabs disabled?

Some conditions enable particular tabs. You can see the reason why that tab is not enabled, by hovering your mouse over the circle icon on the tab.

How can I set up alerts for production models to send mails when threshold violation is found in it?

Before setting up alerts, you must configure SMTP server in Cloud Pak for Data. For more information, see Enabling email notifications

Why an error “Training complete with errors” is shown on the UI when configuring drift?

It is because your drift model is partially configured. For more information, read the message that is shown on the UI by clicking the information icon in Drift Model tile.

What are the different kinds of drift that IBM Watson OpenScale detects?

Watson OpenScale detects both drift in model accuracy and drift in data.

What is model accuracy drift?

Watson OpenScale estimates the drop in accuracy of the model at run time. Model accuracy drops if there is an increase in transactions that are similar to those that the model did not evaluate correctly in the training data.

This type of drift is calculated for structured binary and multi-class classification models only.

What is data drift?

Watson OpenScale estimates the drop in consistency of the data at runtime as compared to the characteristics of the data at training time. This drop in consistency of data is also termed as data drift. This type of drift is calculated for all structured models.

Why should one be concerned about model accuracy drift or data drift?

A drop in either model accuracy or data consistency leads to a negative impact on the business outcomes that are associated with the model and must be addressed by retraining the model.

Are there any limitations for the drift monitor in IBM Watson OpenScale?

The following limitations apply to the drift monitor:

• Drift is supported for structured data only. • Although classification models support both data and accuracy drift, regression models support only data drift. • Drift is not supported for Python functions.

How is drop in accuracy that is, model accuracy drift calculated in Watson OpenScale?

Watson OpenScale learns the behavior of the model by creating a proxy model, also known as a drift detection model. It looks at the training data and how the model is making predictions on the training data.

For more information, see Drift detection.

How is the drop in data consistency calculated in IBM Watson OpenScale?

IBM Watson OpenScale learns single and two-column constraints or boundaries on the training data at the time of configuration. It then analyzes all payload transactions to determine which transactions are causing drop in data consistency. For more information, see .

Can Watson OpenScale detect drift in my classification model?

Yes, Watson OpenScale can detect both drop in model accuracy and drop in data consistency for structured classification models.

Can Watson OpenScale detect drift in my regression model?

Watson OpenScale can detect a drop in data consistency only for structured regression models.

Can Watson OpenScale detect drift in my model that is trained on text corpus?

Watson OpenScale cannot detect drift in text-based models as of now.

Can Watson OpenScale detect drift in my model that is trained on image data?

Watson OpenScale cannot detect drift in image-based models as of now.

Can Watson OpenScale detect drift in my Python function that is deployed on IBM Watson Machine Learning?

Watson OpenScale cannot detect drift in Python functions as of now.