Known issues and limitations
The following limitations and known issues apply to Cloud Pak for Data as a Service.
- Regional limitations
- IBM Knowledge Catalog
- Masking flow
- Watson Query
- Watson Studio
- Data Refinery
- Visualizations
- Watson Machine Learning
- Watson OpenScale
- SPSS Modeler
- Connections
- Orchestration Pipelines
- Cloud Object Storage issues
List of IBM Knowledge Catalog issues
-
Unable to preview assets when row filters are applied in data protection rules
-
Only the data class filter in metadata enrichment results is case-sensitive
-
Enrichment details for an asset might not reflect the settings applied on latest enrichment run
-
Can't access individual pages in a metadata enrichment asset directly
-
In some cases, you might not see the full log of a metadata enrichment job run in the UI
-
Schema information might be missing when you filter enrichment results
-
Issues with search on the Assets tab of a metadata enrichment asset
-
Writing metadata enrichment output to an earlier version of Apache Hive than 3.0.0
-
Business terms filter in enrichment results might not immediately reflect assignment changes
-
For assets from SAP OData sources, the metadata enrichment results do not show the table type
-
Connection asset type doesn't get permanently deleted after the removal
List of Masking flow issues
List of Watson Query issues
List of notebooks issues
- Duplicating a notebook doesn't create a unique name in the new projects UI
- Can't create assets in older accounts
- Error during login
- 500 internal server error received when launching Watson Studio
- Failure to export a notebook to HTML in the Jupyter Notebook editor
- Manual installation of some tensor libraries is not supported
- Connection to notebook kernel is taking longer than expected after running a code cell
- Using the predefined sqlContext object in multiple notebooks causes an error
- Connection failed message
- Hyperlinks to notebook sections don't work in preview mode
- Can't connect to notebook kernel
- Insufficient resources available error when opening or editing a notebook
List of Data Refinery limitations
List of Visualizations issues
List of machine learning issues
- Region requirements
- Accessing links if you create a service instance while associating a service with a project
- Deployment issues
- Federated Learning assets cannot be searched in All assets, search results, or filter results in the new projects UI
- Data module not found in IBM Federated Learning
- Previewing masked data assets is blocked in deployment space
- Batch deployment jobs that use large inline payload might get stuck in
starting
orrunning
state
List of machine learning limitations
List of Watson OpenScale issues
List of SPSS Modeler issues
List of connection issues
Issues with Cloud Object Storage
- List of machine learning issues
- Error with assets using Watson Machine Learning in projects specifying Cloud Object Storage with Key Protect enabled.
- Auto AI
- Federated Learning
- Pipelines
- List of SPSS Modeler issues
- Unable to save model to project specifying Cloud Object Storage with Key Protect enabled.
- List of notebooks issues
- Unable to save model to project specifying Cloud Object Storage with Key Protect enabled.
IBM Knowledge Catalog
If you use IBM Knowledge Catalog, you might encounter these known issues and restrictions when you use catalogs.
Connection asset type doesn't get permanently deleted after the removal
Asset type Connection
does not get deleted immediately after the removal, even if the asset removal configuration is set to Purge assets automatically upon removal in the catalog UI, and is showing in trash.
Catalog asset search doesn't support special characters
If search keywords contain any of the following special characters, the search filter doesn't return the most accurate results.
Search keywords:
. + - && || ! ( ) { } [ ] ^ " ~ * ? : \
Workaround: To obtain the most accurate results, search only for the keyword after the special character. For example, instead of AUTO_DV1.SF_CUSTOMER, search for SF_CUSTOMER.
Unable to preview assets when row filters are applied in data protection rules
If the column names of the assets include spaces and you apply data protection rules with row filtering, you cannot preview the asset and the following error displays:
Unable to show preview
Preview failed to load. Try again or contact your system administrator.
Workaround: To apply and preview row filters for assets in data protection rules, avoid specifying spaces in the column names. For example, use the column name First_name
(instead of First name
).
Masked data is not supported in data visualizations
Masked data is not supported in data visualizations. If you attempt to work with masked data while generating a chart in the Visualizations tab of a data asset in a project the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported
.
Data is not masked in some project tools
When you add a connected data asset that contains masked columns from a catalog to a project, the columns remain masked when you view the data and when you refine the data in the Data Refinery tool. However, other tools in projects do not preserve masking when they access data through a connection. For example, when you load connected data in a Notebook, a DataStage flow, a dashboard, or other project tools, you access the data through a direct connection and bypass masking.
Predefined governance artifacts might not be available
If you don't see any predefined classifications or data classes, reinitialize your tenant by using the following API call:
curl -X POST "https://api.dataplatform.cloud.ibm.com/v3/glossary_terms/admin/initialize_content" -H "Authorization: Bearer $BEARER_TOKEN" -k
Add collaborators with lowercase email addresses
When you add collaborators to the catalog, enter email addresses with all lowercase letters. Mixed-case email addresses are not supported.
Object Storage connection restrictions
When you look at a Cloud Object Storage (S3 API) or Cloudant connection, the folder itself is listed as a child asset.
Multiple concurrent connection operations might fail
An error might be encountered when multiple users are running connection operations concurrently. The error message can vary.
Can't enable data protection rule enforcement after catalog creation
You cannot enable the enforcement of data protection rules after you create a catalog. To apply data protection rules to the assets in a catalog, you must enable enforcement during catalog creation.
Assets are blocked if evaluation fails
The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can't have duplicate column names, a period (.), or single quotation mark (') in a column name.
If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.
Only the data class filter in metadata enrichment results is case-sensitive
When you filter metadata enrichment results on the Column tab, only the Data class entries are case-sensitive. The entries in the Business terms, Schemas, and Assets filters are all lowercase regardless of the actual casing of the value.
Enrichment details for an asset might not reflect the settings applied on latest enrichment run
After you edit the enrichment options for a metadata enrichment that was run at least once, the asset details might show the updated options instead of the options applied in the latest enrichment run.
Can't access individual pages in a metadata enrichment asset directly
If the number of assets or columns in a metadata enrichment asset spans several pages, you can't go to a specific page directly. The page number drop-down list is disabled. Use the Next page and Previous page buttons instead.
In some cases, you might not see the full log of a metadata enrichment job run in the UI
If the list of errors in a metadata enrichment run is exceptionally long, only part of the job log might be displayed in the UI.
Workaround: Download the entire log and analyze it in an external editor.
Schema information might be missing when you filter enrichment results
When you filter assets or columns in the enrichment results on source information, schema information might not be available.
Workaround: Rerun the enrichment job and apply the Source filter again.
Issues with search on the Assets tab of a metadata enrichment asset
When you search for an asset on the Assets tab of a metadata enrichment asset, no results might be returned. Consider these limitations:
- Search is case sensitive.
- The result contains only records that match the exact search phrase or start with the phrase.
Writing metadata enrichment output to an earlier version of Apache Hive than 3.0.0
If you want to write data quality output generated by metadata enrichment to an Apache Hive database at an earlier software version than 3.0.0, set the following configuration parameters in your Apache Hive Server:
set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.enforce.bucketing=true; # (not required for version 2)
set hive.compactor.initiator.on=true;
set hive.compactor.cleaner.on=true; # might not be available depending on the version
set hive.compactor.worker.threads=1;
For more information, see Hive Transactions.
Business terms filter in enrichment results might not immediately reflect assignment changes
When you assign or unassign business terms manually, the business terms filter might not immediately reflect these changes.
Workaround: Refresh the page by clicking the Refresh icon .
For assets from SAP OData sources, the metadata enrichment results do not show the table type
In general, metadata enrichment results show for each enriched data asset whether the asset is a table or a view. This information cannot be retrieved for data assets from SAP OData data sources and is thus not shown in the enrichment results.
Masking flow
If you use Masking flow, you might encounter these known issues and restrictions when you are privatizing data.
Masking flow jobs might fail
During a masking flow job, Spark might attempt to read all of a data source into memory. Errors might occur when there isn't enough memory to support the job. The largest volume of data that can fit into the largest deployed Spark processing node is approximately 12GBs.
Notebook issues
You might encounter some of these issues when getting started with and using notebooks.
Duplicating a notebook doesn't create a unique name in the new projects UI
When you duplicate a notebook in the new projects UI, the duplicate notebook is not created with a unique name.
Can't create assets in older accounts
If you're working in an instance of Watson Studio that was activated before November, 2017, you might not be able to create operational assets, like notebooks. If the Create button stays gray and disabled, you must add the Watson Studio service to your account from the Services catalog.
500 internal server error received when launching Watson Studio
Rarely, you may receive an HTTP internal server error (500) when launching Watson Studio. This might be caused by an expired cookie stored for the browser. To confirm the error was caused by a stale cookie, try launching Watson Studio in a private browsing session (incognito) or by using a different browser. If you can successfully launch in the new browser, the error was caused by an expired cookie. You have a choice of resolutions:
- Exit the browser application completely to reset the cookie. You must close and restart the application, not just close the browser window. Restart the browser application and launch Watson Studio to reset the session cookie.
- Clear the IBM cookies from the browsing data and launch Watson Studio. Look in the browsing data or security options in the browser to clear cookies. Note that clearing all IBM cookies may affect other IBM applications.
If the 500 error persists after performing one of these resolutions, check the status page for IBM Cloud incidents affecting Watson Studio. Additionally, you may open a support case at the IBM Cloud support portal.
Error during login
You might get this error message while trying to log in to Watson Studio: "Access Manager WebSEAL could not complete your request due to an unexpected error." Try to log in again. Usually the second login attempt works.
Failure to export a notebook to HTML in the Jupyter Notebook editor
When you are working with a Jupyter Notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.
Workaround
-
In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.
-
Remove the following metadata:
"widgets": { "state": {}, "version": "1.1.2" }
-
Click Edit.
-
Save the notebook.
Manual installation of some tensor libraries is not supported
Some tensor flow libraries are preinstalled, but if you try to install additional tensor flow libraries yourself, you get an error.
Connection to notebook kernel is taking longer than expected after running a code cell
If you try to reconnect to the kernel and immediately run a code cell (or if the kernel reconnection happened during code execution), the notebook doesn't reconnect to the kernel and no output is displayed for the code cell. You need to manually reconnect to the kernel by clicking Kernel > Reconnect. When the kernel is ready, you can try running the code cell again.
Using the predefined sqlContext object in multiple notebooks causes an error
You might receive an Apache Spark error if you use the predefined sqlContext object in multiple notebooks. Create a new sqlContext object for each notebook. See this Stack Overflow explanation.
Connection failed message
If your kernel stops, your notebook is no longer automatically saved. To save it, click File > Save manually, and you should get a Notebook saved message in the kernel information area, which appears before the Spark version. If you get a message that the kernel failed, to reconnect your notebook to the kernel click Kernel > Reconnect. If nothing you do restarts the kernel and you can't save the notebook, you can download it to save your changes by clicking File > Download as > Notebook (.ipynb). Then you need to create a new notebook based on your downloaded notebook file.
Hyperlinks to notebook sections don't work in preview mode
If your notebook contains sections that you link to from an introductory section at the top of the notebook for example, the links to these sections will not work if the notebook was opened in view-only mode in Firefox. However, if you open the notebook in edit mode, these links will work.
Can't connect to notebook kernel
If you try to run a notebook and you see the message Connecting to Kernel
, followed by Connection failed. Reconnecting
and finally by a connection failed error message, the reason might be that your firewall is blocking
the notebook from running.
If Watson Studio is installed behind a firewall, you must add the WebSocket connection wss://dataplatform.cloud.ibm.com
to the firewall settings. Enabling this WebSocket connection is required when you're using notebooks and RStudio.
Data Refinery limitations
Data column headers cannot contain special characters
Data with column headers that contain special characters might cause Data Refinery jobs to fail, and give the error Supplied values don't match positional vars to interpolate
.
Workaround: Remove the special characters from the column headers.
Visualizations issues
You might encounter some of these issues when working with the Visualization tab in a Data asset in a project.
The column-level profile information for a connected data asset with a column of type DATE, does not show rows
In the column-level profile information for a connected data asset with a column of type DATE, no rows are displayed when you click show rows in the tabs Data Classes, Format or Types.
Machine learning issues
You might encounter some of these issues when working with machine learning tools.
Region requirements
You can only associate a Watson Machine Learning service instance with your project when the Watson Machine Learning service instance and the Watson Studio instance are located in the same region.
Accessing links if you create a service instance while associating a service with a project
While you are associating a Watson Machine Learning service to a project, you have the option of creating a new service instance. If you choose to create a new service, the links on the service page might not work. To access the service terms, APIs, and documentation, right click the links to open them in new windows.
Federated Learning assets cannot be searched in All assets, search results, or filter results in the new projects UI
You cannot search Federated Learning assets from the All assets view, the search results, or the filter results of your project.
Workaround: Click the Federated Learning asset to open the tool.
Deployment issues
- A deployment that is inactive (no scores) for a set time (24 hours for the free plan or 120 hours for a paid plan) is automatically hibernated. When a new scoring request is submitted, the deployment is reactivated and the score request is served. Expect a brief delay of 1 to 60 seconds for the first score request after activation, depending on the model framework.
- For some frameworks, such as SPSS modeler, the first score request for a deployed model after hibernation might result in a 504 error. If this happens, submit the request again; subsequent requests should succeed.
Previewing masked data assets is blocked in deployment space**
A data asset preview may fail with this message:
This asset contains masked data and is not supported for preview in the Deployment Space
Deployment spaces currently don't support masking data so the preview for masked assets has been blocked to prevent data leaks.
Batch deployment jobs that use large inline payload might get stuck in starting
or running
state
If you provide a large asynchronous payload for your inline batch deployment, it can result in the runtime manager process to go out of heap memory.
In the following example, 92 MB of payload was passed inline to the batch deployment which resulted in the heap to go out of memory.
Uncaught error from thread [scoring-runtime-manager-akka.scoring-jobs-dispatcher-35] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[scoring-runtime-manager]
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
...
This could result in concurrent jobs getting stuck in starting
or running
state. The starting
state can only be cleared once the deployment is deleted and a new deployement is created. The running
state can be cleared without deleting the deployment.
As a workaround, use data references instead of inline for huge payloads that are provided to batch deployments.
Watson Machine Learning limitations
AutoAI known limitations
-
Currently, AutoAI experiments do not support double-byte character sets. AutoAI only supports CSV files with ASCII characters. Users must convert any non-ASCII characters in the file name or content, and provide input data as a CSV as defined in this CSV standard.
-
To interact programmatically with an AutoAI model, use the REST API instead of the Python client. The APIs for the Python client required to support AutoAI are not generally available at this time.
Data module not found in IBM Federated Learning
The data handler for IBM Federated Learning is trying to extract a data module from the FL library but is unable to find it. You might see the following error message:
ModuleNotFoundError: No module named 'ibmfl.util.datasets'
The issue possibly results from using an outdated DataHandler. Please review and update your DataHandler to conform to the latest spec. Here is the link to the most recent MNIST data handler or ensure your sample versions are up-to-date.
Setting environment variables in a conda yaml file does not work for deployments
Setting environment variables in a conda yaml file does not work for deployments. This means that you cannot override existing environment variables, for example LD_LIBRARY_PATH
, when deploying assets in Watson Machine Learning.
As a workaround, if you're using a Python function, consider setting default parameters. For details, see Deploying Python functions.
Watson OpenScale issues
You might encounter the following issues in Watson OpenScale:
Drift configuration is started but never finishes
Drift configuration is started but never finishes and continues to show the spinner icon. If you see the spinner run for more than 10 minutes, it is possible that the system is left in an inconsistent state. There is a workaround to this behavior: Edit the drift configuration. Then, save it. The system might come out of this state and complete configuration. If drift reconfiguration does not rectify the situation, contact IBM Support.
SPSS Modeler issues
You might encounter some of these issues when working in SPSS Modeler.
SPSS Modeler runtime restrictions
Watson Studio does not include SPSS functionality in Peru, Ecuador, Colombia and Venezuela.
Timestamp data measured in microseconds
If you have timestamp data that is measured in microseconds, you can use the more precise data in your flow. However, you can import data that is measured in microseconds only from connectors that support SQL pushback. For more information about which connectors support SQL pushback, see Supported data sources for SPSS Modeler.
Merge node and unicode characters
The Merge node treats the following very similar Japanese characters as the same character.
Connection issues
You might encounter this issue when working with connections.
Apache Impala connection does not work with LDAP authentication
If you create a connection to a Apache Impala data source and the Apache Impala server is set up for LDAP authentication, the username and password authentication method in Cloud Pak for Data as a Service will not work.
Workaround: Disable the Enable LDAP Authentication option on the Impala server. See Configuring LDAP Authentication in the Cloudera documentation.
Orchestration Pipelines known issues
The issues pertain to Orchestration Pipelines.
Asset browser does not always reflect count for total numbers of asset type
When selecting an asset from the asset browser, such as choosing a source for a Copy node, you see that some of the assets list the total number of that asset type available, but notebooks do not. That is a current limitation.
Cannot delete pipeline versions
Currently, you cannot delete saved versions of pipelines that you no longer need. All versions will be deleted when the pipeline is deleted.
Deleting an AutoAI experiment fails under some conditions
Using a Delete AutoAI experiment node to delete an AutoAI experiment that was created from the Projects UI does not delete the AutoAI asset. However, the rest of the flow can complete successfully.
Cache appears enabled but is not enabled
If the Copy assets Pipelines node's Copy mode is set to Overwrite
, cache is displayed as enabled but remains disabled.
Pipelines cannot save some SQL statements
Pipelines cannot save when SQL statements with parentheses are passed in a script or user variable.
To resolve this issue, replace all instances of parentheses with their respective ASCII code ((
with #40
and )
with #41
) and replace the code when you set it as a user variable.
For example, the statement select CAST(col1 as VARCHAR(30)) from dbo.table
in a Run Bash script node will cause an error. Instead, use the statement select CAST#40col1 as VARCHAR#4030#41#41 from dbo.table
and replace the instances when setting it as a user variable.
Orchestration Pipelines abort when limit for annotations is reached
Pipeline expressions require annotations, which have a limit due to the limit for annotations in Kubernetes. If you reach this limit, your pipeline will abort without displaying logs.
Orchestration Pipelines limitations
These limitations apply to Orchestration Pipelines.
- Single pipeline limits
- Input and output size limits
- Batch input limited to data assets
- Bash scripts throws errors with curl commands
Single pipeline limits
These limitation apply to a single pipeline, regardless of configuration.
- Any single pipeline cannot contain more than 120 standard nodes
- Any pipeline with a loop cannot contain more than 600 nodes across all iterations (for example, 60 iterations - 10 nodes each)
Input and output size limits
Input and output values, which include pipeline parameters, user variables, and generic node inputs and outputs, cannot exceed 10 KB of data.
Batch input limited to data assets
Currently, input for batch deployment jobs is limited to data assets. This means that certain types of deployments, which require JSON input or multiple files as input, are not supported. For example, SPSS models and Decision Optimization solutions that require multiple files as input are not supported.
Bash scripts throws errors with curl commands
The Bash scripts in your pipelines might cause errors if you implement curl commands in them. To prevent this issue, set your curl commands as parameters. To save a pipeline that causes error when saving, try exporting the isx file and importing them into a new project.
Issues with Cloud Object Storage
These issue apply to working with Cloud Object Storage.
Issues with Cloud Object Storage when Key Protect is enabled
Key Protect in conjunction with Cloud Object Storage is not supported for working with Watson Machine Learning assets. If you are using Key Protect, you might encounter these issues when you are working with assets in Watson Studio.
- Training or saving these Watson Machine Learning assets might fail:
- Auto AI
- Federated Learning
- Pipelines
- You might be unable to save an SPSS model or a notebook model to a project
Issues with watsonx.governance
Integration limitation with OpenPages
When the AI Factsheets is integrated with OpenPages, the fields created in the field groups MRG-UserFacts-Model
or MRG-UserFact-Model
and MRG-UserFacts-ModelEntry
or MRG-UserFact-ModelUseCase
are synced to modelfacts_user_op
and model_entry_user_op
asset type definitions. However, when the fields are created from the OpenPages application, avoid specifying the fields as required, and do not specify a
range of values. If you mark them as required or assign a range of values, the sync will fail.
Delay showing prompt template deployment data in a factsheet
When a deployment is created for a prompt template, the facts for the deployment are not added to factsheet immediately. You must first evaluate the deployment or view the lifecycle tracking page to add the facts to the factsheet.
Redundant attachment links in factsheet
A factsheet tracks all of the events for an asset over all phases of the lifecycle. Attachments show up in each stage, creating some redundancy in the factsheet.
Attachments for prompt templates are not saved on import or export
If your AI use case contains attachments for a prompt template, the attachments are not preserved when the prompt template asset is exported from a project or imported into a project or space. You must reattach any files following the import operation.