0 / 0
What's new

What's new

Check back each week to learn about new features and updates for Cloud Pak for Data as a Service and services such as Watson Studio, Watson Machine Learning, DataStage, and IBM Knowledge Catalog.

Tip: Occasionally, you must take a specific action after an update. To see all required actions, search this page for “Action required”.

Week ending 12 July 2024

Tracking data protection rule enforcement decisions

9 July 2024

You can now track enforcement decisions as audit events when the Send policy evaluations to audit logs checkbox is selected from the Managing rule settings page.

Week ending 5 July 2024

Connectors grouped by data source type

05 July 2024

When you create a connection, the connectors are now grouped by data source type so that the connectors are easier to find and select. For example, the MongoDB data source type includes the IBM Cloud Databases for MongoDB and the MongoDB connectors.

In addition, a new Recents category shows the six latest connectors that you used to create a connection.

For instructions, see Adding connections to data sources in a project or Adding connections to data sources in a catalog.

Bulk edits for governance artifact properties

05 July 2024

You can now change the primary or secondary category for multiple governance artifacts at once. Bulk edits are also available when updating relationships. For more information, see Managing governance artifacts.

Setting an assignment threshold for results of relationship analyses

05 July 2024

You now also set a threshold for when results of a relationship analysis should be assigned automatically. You can set a project default but overwrite the setting for each analysis run. For details, see Identifying relationships.

Changes to Cloud Object Storage Lite plans

01 July 2024

Starting on 1 July 2024, the Cloud Object Storage Lite plan that is automatically provisioned when you sign up for a 30 day trial of Cloud Pak for Data as a Service expires after the trial ends. You can upgrade your Cloud Object Storage Lite instance to the Standard plan with the Free Tier option at any time during the 30 day trial.

Existing Cloud Object Storage service instances with Lite plans that you provisioned prior to 1 July 2024 will be retained until 15 December 2024. You must upgrade your Cloud Object Storage service to a Standard plan before 15 December 2024.

See Cloud Object Storage service plans.

Week ending 21 June 2024

Adding catalog assets to projects

20 June 2024

Added a Add catalog assets to projects user permission. Now, to add assets to projects, you must have the Add catalog assets to projects, the Admin, Editor or Viewer role in the catalog, and be the asset owner or editor. Users that don't have an existing role with the Manage catalogs or Access catalogs permission must be explicitly granted the Add catalog assets to projects permission.

Cognos Dashboard removal postponed

20 June 2024

Any existing dashboards that you created with the Cognos Dashboards Embedded service will now continue working until 30 September 2024. You can no longer provision an instance of the Cognos Dashboards Embedded service. You can use Cognos Analytics on Cloud On-Demand as a replacement for Cognos Dashboards Embedded. For more information, see IBM Cognos Analytics Pricing Plans.

Task credentials will be required for deployment job requests

19 Jun 2024

To improve security for running deployment jobs, the user requesting the job will be required to provide task credentials in the form of an API key. The requirement will be enforced starting August 15, 2024. See Adding task credentials for details on generating the API key.

Screenshot showing how to create task credentials from Profile and settings

Enhanced data enrichment in IBM Knowledge Catalog

20 Jun 2024

In addition to the existing capabilities, metadata enrichment now provides options for semantic and AI-augmented data enrichment:

  • Recommend descriptive names for tables and columns based on the collected metadata and a predefined glossary.
  • Suggest and assign semantic descriptions for the contents of tables and columns based on the surrounding columns and the context of the tables.
  • Complete semantic term assignment for tables and columns.

For details, see Designing metadata enrichments.

These new gen AI based metadata enrichment features are available only in the Dallas region.

IBM Federated Learning Python client change

20 Jun 2024

Federated Learning's Python client library has been merged with the watsonx.ai library. Your code samples must be updated with the newest Python client. See Connecting to the aggregator.

Connect to a new data source in DataStage: IBM Planning Analytics

14 Jun 2024

You can now include data from an IBM Planning Analytics data source in your DataStage flows.

For the full list of DataStage connectors, see Supported data sources in DataStage.

Week ending 7 June 2024

Bulk edits for governance artifacts

7 Jun 2024

You can now make changes to multiple governance artifacts at once when you want to edit tags or stewards. For more information, see Managing governance artifacts.

Changing parent category for individual artifacts

7 Jun 2024

When viewing artifact details, you can now change the parent category by selecting Move to from the three-dot action menu.

Data protection rules no longer enforced in projects

7 June 2024

Data protection rules are now only enforced either in governed catalogs or by a deep enforcement solution. A deep enforcement solution is a protection solution to enforce rules on data that is outside of Cloud Pak for Data when the data source is integrated with one of these services:

  • IBM Watson Query
  • IBM watsonx.data

Assets that are added into projects from a governed catalog no longer have preview, download or profiling restricted by data protection rules unless you have configured a deep enforcement solution.

You will be reminded of the revised data protection rule enforcement protocols when you:

  • Creating a data protection rule.
  • Copying an asset from a governed catalog into a project

For details, see Accept revised protocol for enforcing data protection rules.

Managing reports settings

6 June 2024

IBM Cloud account owners or administrators can now manage the reports settings on the Account page. For more information, see Managing your account settings.

Week ending 31 May 2024

IBM Watson Pipelines is now IBM Orchestration Pipelines

30 May 2024

The new service name reflects the capabilities for orchestrating parts of the AI lifecycle into repeatable flows.

Tag projects for easy retrieval

31 May 2024

You can now assign tags to projects to make them easier to group or retrieve. Assign tags when you create a new project or from the list of all projects. Filter the list of projects by tag to retrieve a related set of projects. For more information, see Creating a project.

Connect to a new data source: Milvus

31 May 2024

Use the Milvus connection to store and confirm the accuracy of your credentials and connection details to access a Milvus vector store. For information, see Milvus connection.

Week ending 24 May 2024

Asset user and role

24 May 2024

Updated the asset membership roles for catalogs. Now, users can hold the asset owner, asset editor, or asset viewer role. The asset editor role replaced the asset member role. Now, to complete any asset-related actions, you must be an asset owner or asset editor.

Also, assets might have more than one owner now.

You can change asset user roles on the Access control page of an asset by selecting a role from the Role dropdown menu.

Bulk actions on catalog assets

24 May 2024

You can now edit and remove the business terms, owners or tags on up to 20 assets at a time.

Week ending 10 May 2024

New filters for enrichment results

10 May 2024

You can now apply additional filters to your enrichment results:

  • Assigned, suggested, or no business terms
  • Assigned, suggested, or no data class

Name changes for DataStage connections and connectors

10 May 2024

The following DataStage connections and connectors have new names:

  • "Apache Cassandra (optimized)" is now "Apache Cassandra for DataStage".
  • "IBM Db2 (optimized") is now "IBM Db2 for DataStage".
  • "IBM Netezza Performance Server (optimized)" is now "IBM Netezza Performance Server for DataStage".
  • "Oracle (optimized)" is now "Oracle Database for DataStage".
  • "Salesforce.com (optimized)" is now "Salesforce API for DataStage".
  • "Teradata (optimized)" is now "Teradata database for DataStage".

Your previous settings for the connections, connectors, and their associated jobs remain the same. Only the connection and connector names have changed.

Week ending 26 April 2024

Name change for the IBM Watson Query connection

26 Apr 2024

The "IBM Watson Query" connection has been renamed to "IBM Data Virtualization". Your previous settings for the connection remain the same. Only the connection name has changed.

Name change for the DataStage IBM Watson Query connector

26 Apr 2024

The DataStage "IBM Watson Query" connector name has changed to "IBM Data Virtualization". This change coincides with the connection name change. Your previous settings for the connection, connector, and the associated jobs remain the same. Only the connection and connector name have changed.

Masking watsonx.data in IBM Knowledge Catalog

26 Apr 2024

You can protect sensitive data in watsonx.data by using masking capabilities of IBM Knowledge Catalog. For more information, see Masking watsonx.data assets in IBM Knowledge Catalog.

Week ending 19 April 2024

Enhanced project list view in catalogs

18 Apr 2024

Now, when you are adding assets from a catalog to a project, you can view more than 100 projects in your project list page and add up to 50 assets at a time to your project. For more information, see Add assets from within the catalog.

Evaluate machine learning deployments in spaces

18 Apr 2024

Configure watsonx.governance evaluations in your deployment spaces to gain insights about your machine learning model performance. For example, evaluate a deployment for bias or monitor a deployment for drift. When you configure evaluations, you can analyze evaluation results and model transaction records directly in your spaces.

For more information, see Evaluating deployments in spaces.

19 Apr 2024

Factsheets available from AI use cases on main navigation menu

Factsheets that track lifecycle details for machine learning models are now stored in AI uses cases rather than model use cases. AI use cases and external models are displayed on the main navigation menu for easy access.

AI use cases on main navigation menu

Week ending 12 April 2024

Revised data protection rule enforcement protocol across Cloud Pak for Data

12 Apr 2024

A revised version of the data protection rule enforcement protocol is now in place across Cloud Pak for Data. When you're inside of a governed catalog and click Add to project, information about the new data protection rule enforcement protocol appears. You must acknowledge it to continue.

Cognos Dashboards Embedded service is deprecated

11 Apr 2024

You can no longer provision an instance of the Cognos Dashboards Embedded service. However, any existing dashboards that you created with the Cognos Dashboards Embedded service will continue working until 20 June 2024. You can use Cognos Analytics on Cloud On-Demand as a replacement for Cognos Dashboards Embedded. For more information, see IBM Cognos Analytics Pricing Plans.

Week ending 5 April 2024

Use pivot tables to display data aggregated in Decision Optimization experiments

5 Apr 2024

You can now use pivot tables to display both input and output data aggregated in the Visualization view in Decision Optimization experiments. For more information, see Visualization widgets in Decision Optimization experiments.

Access the list of connection API properties from the user interface

05 Apr 2024

Previously the only way to view the connection properties was to open a new web page at https://dataplatform.cloud.ibm.com/connections/docs. Now you can access the same information from Data > Platform connections. Expand Connection resources, and select Connection properties.

Connection properties

You can use these properties to create connections with the connections in the Watson Data API. For example, if you create a connection in a notebook programmatically, you can use this information to identify the properties that you need.

Week ending 22 March 2024

Create dynamic views of connected data

21 March 2024

A new type of connected data asset provides filtered access to data from data sources that support SQL queries so you can access only relevant data. In a project, provide an SQL query to create a view of specific columns or rows from one or more tables. You can use these data assets in metadata enrichment and data quality analysis just like any other connected data asset.

For more information, see Adding a dynamic view of connected data to a project.

Use Delta Lake or Apache Iceberg table formats in the Amazon S3 and the Apache HDFS connectors

22 March 2024

The Amazon S3 and the Apache HDFS connectors now include properties for the Delta Lake and Apache Iceberg table formats. These table formats are integral to data lakes, which provide a centralized repository for managing large data volumes. Data lakes serve as a foundation for collecting and analyzing structured, semi-structured, and unstructured data in its original format for long-term storage and to drive insights and predictions.

The table format property is included in the interaction properties for the supported tools. For example, in the connector Stage properties in DataStage.

Week ending 23 February 2024

Access data from DataStax Enterprise

23 Feb 2024

You can now work with data from DataStax Enterprise.

Week ending 16 February 2024

Case-sensitive codes in reference data sets in IBM Knowledge Catalog

16 Feb 2024

Reference data values consist of at least two columns: code and value. For all new reference data sets the code column is now case-sensitive. When you add values to a new reference data set, the code is saved exactly as you type it. Note that any reference data sets that were created before this change was introduced remain case-insensitive, and any new values added there will be saved in upper case. These reference data sets are marked with a Case-insensitive tag in the UI. For details, see Case-sensitive code.

Improved search, filter and sort options for reference data sets in IBM Knowledge Catalog

16 Feb 2024

When you view a list of reference data values, you can use the following methods to find the required values faster:

  • Use a search bar to type a query for a code, value or a custom column value.
  • Use one of the 6 advanced filter options.
  • Use the sorting feature.

The search, filter, and sort options can be combined. For details, see Viewing reference data sets.

Week ending 09 February 2024

New Spark 3.4 environment for running Data Refinery flow jobs

09 Feb 2024

When you select an environment for a Data Refinery flow job, you can now select Default Spark 3.4 & R 4.2, which includes enhancements from Spark.

Data Refinery Spark environments

The Default Spark 3.3 & R 4.2 environment is deprecated and will be removed in a future update.

Update your Data Refinery flow jobs to use the new Default Spark 3.4 & R 4.2 environment. For details, see Compute resource options for Data Refinery in projects.

More task-oriented Decision Optimization documentation

09 Feb 2024

You can now more easily find the right information for creating and configuring Decision Optimization experiments. See Decision Optimization experiments and its subsections.

Pagination view feature to publish assets to a catalog

08 Feb 2024

When you are publishing project assets to a catalog, you can now view 20 catalogs and assets on each page with the pagination view. Previously, you can view your assets on a list. See Publishing assets to a catalog.

Advanced analysis types in metadata enrichment are available in the Frankfurt region

09 Feb 2024

Advanced primary key and relationship analysis and advanced profiling are now also available in the Frankfurt region, in addition to the Dallas region.

IBM Cloud Data Engine connection is deprecated

08 Feb 2024

The IBM Cloud Data Engine connection is deprecated and will be discontinued in a future release. See Deprecation of Data Engine for important dates and details.

Week ending 02 February 2024

Save your searches for catalog assets

02 Feb 2024

Each user can now save up to 25 searches within each of their catalogs. The user who saves a search in a catalog is the only user who can view, run, edit, and remove the search. For more information, see Saving searches for catalog assets.

IBM Cloud Databases for DataStax connection is discontinued

02 Feb 2024

The IBM Cloud Databases for DataStax connection has been removed from Cloud Pak for Data as a Service.

Dremio connection requires updates

02 Feb 2024

Previously the Dremio connection used a JDBC driver. Now the connection uses a driver based on Arrow Flight.

Important: Update the connection properties. Different changes apply to a connection for a Dremio Software (on-prem) instance or a Dremio Cloud instance.

Dremio Software: Update the port number.

The new default port number that is used by Flight is 32010. You can confirm the port number in the dremio.conf file. See Configuring via dremio.conf for information.

Additionally, Dremio no longer supports connections with IBM Cloud Satellite.

Dremio Cloud: Update the authentication method and hostname.

  1. Log into Dremio and generate a personal access token. For instructions see Personal Access Tokens.
  2. In Cloud Pak for Data as a Service in the Create connection: Dremio form, change the authentication type to Personal Access Token and add the token information. (The Username and password authentication can no longer be used to connect to a Dremio Cloud instance.)
  3. Select Port is SSL-enabled.

If you use the default hostname for a Dremio Cloud instance, you need to change it:

  • Change sql.dremio.cloud to data.dremio.cloud
  • Change sql.eu.dremio.cloud to data.eu.dremio.cloud

Additional analysis types in metadata enrichment (IBM Knowledge Catalog)

31 Jan 2024

Metadata enrichment now provides these additional analysis options:

  • Primary key analysis to detect primary keys in your data that uniquely identify each record in a data asset.

    Shallow analysis is automatically included when you select the Profile data enrichment option. Advanced analysis can be run on selected assets from the enrichment results.

  • Relationship analysis to identify relationships between data asset or to find overlapping and redundant data in columns.

    Shallow key relationship analysis is run when you select the new Set relationships enrichment option. Advanced analysis can be run on selected assets from the enrichment results.

  • Advanced profiling to get more exact results for certain metrics, such as frequency distribution and uniqueness of values within a column.

    Advanced profiling can be run on selected assets from the enrichment results.

Advanced primary key and relationship analysis and advanced profiling require the DataStage service in addition to the IBM Knowledge Catalog service and are available only in the Dallas region.

For more information, see Creating a metadata enrichment assetIdentifying primary keys, Identifying relationships, and Advanced data profiles.

Week ending 26 January 2024

AutoAI supports ordered data for all experiments

25 Jan 2024

You can now specify ordered data for all AutoAI experiments rather than just time series experiments. Specify if your training data is ordered sequentially, according to a row index. When input data is sequential, model performance is evaluated on newest records instead of a random sampling, and holdout data uses the last n records of the set rather than n random records. Sequential data is required for time series experiments but optional for classification and regression experiments.

Set to dark theme

25 Jan 2024

You can now set your Cloud Pak for Data as a Service user interface to dark theme. Click your avatar and select Profile and settings to open your account profile. Then, set the Dark theme switch to on. Dark theme is not supported in RStudio and Jupyter notebooks. For information on managing your profile, see Managing your settings.

Week ending 19 January 2024

View native type information in the details panel for asset columns

19 Jan 2024

Now, you can view both standardized and native data types directly in the column details panel. To view the native type information, click an asset column name from the Overview page of an asset.

New option for rule action precedence (IBM Knowledge Catalog)

18 Jan 2024

Rule action precedence enables you to specify how rules are applied when there are multiple rules with different actions on a data set. You can use the new Hierarchical enforcement option to configure a two-layer evaluation of data protection rules.

  • The first layer evaluates the rules for an Allow or Deny action without considering any masking actions. The decision from this first layer must be to allow access to move to the second layer.
  • The second layer evaluates the rules for a Transform action.

You can set this option from the user interface or from the access_decision_precedence API.

For more information, see Managing rule settings.

Store the results of data quality analysis (IBM Knowledge Catalog)

18 Jan 2024

You now have the option to write the output of the predefined data quality checks that are run as part of metadata enrichment to a database. For example, you might want to store this data so that you can use the tables for tracking quality issues and as input to remediation processes. For more information, see Creating a metadata enrichment.

Connect to a new data source in DataStage: Tableau

18 Jan 2024

You can now include data from a Tableau data source in your DataStage flows.

For the full list of DataStage connectors, see Supported data sources in DataStage.

Week ending 12 January 2024

Support for IBM Runtime 22.2 deprecated in Watson Machine Learning

11 Jan 2024

IBM Runtime 22.2 is deprecated and will be removed on 11 April 2024. Beginning 7 March 2024, you cannot create notebooks or custom environments by using the 22.2 runtimes. Also, you cannot train new models with software specifications that are based on the 22.2 runtime. Update your assets and deployments to use IBM Runtime 23.1 before 7 March 2024.

Week ending 15 December 2023

View data source information in the details panel for catalogs

15 Dec 2023

If you click on an asset from the related items grid, you can view data source information directly in the asset details panel.

Create user API keys for jobs and other operations

15 Dec 2023

Certain runtime operations in Cloud Pak for Data as a Service, such as jobs and model training, require an API key as a credential for secure authorization. With user API keys, you can now generate and rotate an API key directly in Cloud Pak for Data as a Service as needed to help ensure your operations run smoothly. The API keys are managed in IBM Cloud, but you can conveniently create and rotate them in Cloud Pak for Data as a Service.

The user API key is account-specific and is created from Profile and settings under your account profile.

For more information, see Managing the user API key.

New login session expiration and sign out due to inactivity

15 Dec 2023

You are now signed out of IBM Cloud due to session expiration. Your session can expire due to login session expiration (24 hours by default) or inactivity (2 hours by default). You can change the default durations in the Access (IAM) settings in IBM Cloud. For more information, see Set the login session expiration.

Access the list of connection API properties

15 Dec 2023

You can now view the full list of the connectors with their individual properties at: https://dataplatform.cloud.ibm.com/connections/docs.

You can use these properties to create connections with the connections in the Watson Data API. For example, if you create a connection in a notebook programmatically, you can use this information to identify the properties that you need.

Organize project assets into folders

14 Dec 2023

You can now create folders in your projects to organize assets. An administrator of the project must enable folders, and administrators and editors can create and manage them. Folders are in beta and are not yet supported for use in production environments. For more information, see Organizing assets with folders (beta).

The Assets tab with folders

IBM Cloud Databases for DataStax connector is deprecated

15 Dec 2023

The IBM Cloud Databases for DataStax connector is deprecated and will be discontinued in a future release.

Week ending 08 December 2023

New client properties in Db2 connections for workload management

08 Dec 2023

You can now specify properties in the following fields for monitoring purposes: Application name, Client accounting information, Client hostname, and Client user. These fields are optional and are available for the following connections:

Connect to a new data source in DataStage: Looker

08 Dec 2023

You can now include data from a Looker data source in your DataStage flows. (You can use this connection for source data only.)

For the full list of DataStage connectors, see Supported data sources in DataStage.

New and enhanced features in Watson Query

08 Dec 2023

The following new and enhanced features are available in Watson Query:

Use IBM Knowledge Catalog data protection rules to filter rows in virtualized tables

You might have a data source that has tables with government, enterprise, and retail client data combined. For example, a billing table might have data for all the customers, where some of the rows are for government clients and some are for nongovernment clients. The type of the client is not indicated in the billing table. Now, you can filter the list of client records by using one of the following techniques.

You can use a separate table to identify customers that are government clients. The IDs from this table can be used to filter out rows from the billing table. When you filter out rows, the masked table does not contain the rows with data of government clients.

You can use a table of blocked customer identifiers as a reference table. Any rows in the billing table that have rows with the customer identifier that is included in the blocked customer set are filtered out of the resulting set.

Watson Query supports masking columns in virtualized data based on data protection rules that are defined in IBM Knowledge Catalog. Now, you can create data protection rules to include or exclude rows in your virtualized data to avoid exposing sensitive data.

For more information, see Governing virtual data with data protection rules in Watson Query.

Use advanced data masking on virtualized data

You can now use the advanced data masking options in Watson Query to avoid exposing sensitive data.

For more information about the updated masking behavior, see Masking virtual data in Watson Query.

Improved query performance and enforcement of data protection rules

Watson Query now stores and caches data protection rules from IBM Knowledge Catalog in a policy enforcement point (PEP) cache to avoid evaluating rules every time an object is queried. This cache improves the performance of previously executed queries by reducing the number of calls to IBM Knowledge Catalog to fetch the rules. However, you might notice a delay of up to 10 seconds before newly added or updated data protection rules are applied to queries. You can use the web client to configure PEP cache settings, such as cache size and cache live time.

For more information, see Enabling enforcement of data protection rules in Watson Query.

Format and save formatted query access plans for performance tuning

You can now format and save formatted access plans for performance tuning in Watson Query. When you run SQL queries in Watson Query, you can use the web client to format how EXPLAIN information appears when you generate query access plans. You can then run the db2exfmt command from the web client to easily generate and download the EXPLAIN output in text files.

Use wildcard characters to filter your data sources

Now when you create a virtualized table, you can use the following wildcard characters to customize filters to find the data sources that you need:

  • % (percent): To represent zero or more characters
  • _ (underscore): To represent a single character

For more information, see Filtering data in Watson Query.

Watson Query users can publish their own virtual objects

Users with the User role in Watson Query can now publish virtual objects that they created to governed catalogs.

For more information, see Publishing virtual data to a catalog with Watson Query.

Manage who can access and perform operations on individual data sources

With data source access restrictions, you can explicitly manage access to individual data source connections that use shared credentials. You can assign users and roles as collaborators for a data source connection. Only those collaborators can access the data source connection. You assign specific privileges to the collaborators to manage the actions that they can perform on the data sources. This enables you to separate privileges from roles, so that some users who are assigned a role such as Manager can access and take action on different data source connections than other Manager users.

For more information, see Data source connection access restrictions in Watson Query.

Query data in Generic S3 and Microsoft Azure Data Lake Storage Gen2 data lakes

You can now connect to Generic S3 and Microsoft Azure Data Lake Storage Gen2 data sources. For more information, see Supported data sources in Watson Query.

Choose your query mode to prioritize either performance or consistency

You can now choose between running queries in Max Pushdown mode or in Max Consistency mode.

  • Max Pushdown mode ignores semantic difference between Watson Query and data source for single source queries. Therefore, more single source queries might be fully pushed down to data source, improving query performance. Query results are consistent with data source semantics for fully pushed down queries in this mode. Max Pushdown mode does not impact mulitple-source queries.
  • Max Consistency mode follows Watson Query semantics to evaluate whether operations can be pushed down to the data source. If the operation that is executed on the data source generates the same result as Watson Query, the operation can be pushed down. Queries in this mode might be fully pushed down if the remote data source has the same semantics as Watson Query.

Quickly find and virtualize tables with the Explore tab

You can now quickly find the tables that you want to virtualize. On the Virtualize page, you can use the Explore tab to browse through databases, schemas, and available tables in a connected data source. The List tab displays all of the available tables in all of your connected data sources. On the Data sources page, you can filter your data sources to quickly load the reduced list of available tables in the List tab.

For more information, see Creating virtual objects in Watson Query.

Improve statistics collection for virtualized tables by using data sampling

Data sampling improves statistics collection by reducing the resources that you need to collect statistics. When you collect statistics by selecting the Remote query collection method in the web client, a default sampling rate of 20% is used. To optimize statistics collection, select Enable table sampling and choose a sampling rate between 1% and 99%.

If you collect statistics by using the DVSYS.COLLECT_STATISTICS procedure, you can use the TABLESAMPLE option with the remote-query statistics collection type to sample data when you collect statistics. For tips, see Usage notes.

You can also use the DVSYS.COLLECT_STATISTICS procedure to collect statistics for virtualized tables over flat files.

For more information, see the COLLECT_STATISTICS stored procedure in Watson Query.

Use your platform credentials to access Watson Query connections

When you use a platform connection to access Watson Query, you are prompted for your credentials. You can optionally select Use my platform login credentials, rather than entering your personal credentials for the connection. The connection uses your current session JSON Web Token (JWT).

Improvements for data sources in object storage

  • You can now create connections and virtualize files for Generic S3 data sources in object storage:
  • You can now create virtualized tables from externally compressed CSV or TSV files that are stored in object storage. For more information, see Creating a virtual table from files in object storage.
  • You can now virtualize flat files in cloud object storage that contain column headers.

For more information, see Creating a virtualized table from files in cloud object storage in Watson Query.

Predicate pushdown improvements and support for predicate pushdown on more data sources

Predicate pushdown is an optimization that reduces query times and memory usage. This release includes the following improvements to predicate pushdown:

  • Queries that include COUNT (DISTINCT) or GROUP BY clauses can now be pushed down with trailing blanks comparison rules for Teradata, Netezza®, Microsoft SQL Server, Db2® for z/OS®, and Db2 Database data sources.
  • Queries that include a string comparison operation such as a GROUP BY or WHERE predicate against CHAR or VARCHAR data for the Teradata data source to handle case sensitivity.
  • SQL statements with LIKE predicates are now pushed down for: Db2®, SAP HANA, Oracle, PostgreSQL, Apache Hive, MySQL, Microsoft SQL Server, Snowflake, Netezza® Performance Server, and Teradata.
  • SQL statements with Fetch clauses are now pushed down for: Db2, Db2 for z/OS, Apache Derby, Oracle, Amazon Redshift, Google BigQuery, and Salesforce.com data sources.
  • SQL statements with a string comparison filter are now pushed down for: Db2, Microsoft SQL Server, Teradata, Netezza Performance Server, and Apache Derby data sources.
  • SQL statements with OLAP functions are now pushed down for Db2 and Netezza Performance Server data sources.
  • The Greenplum data sources now supports push down of predicates.
  • The MySQL (My SQL Community Edition and My SQL Enterprise Edition) data source now supports push down of predicates.
  • The Cloudera Impala data source now supports push down of predicates.
  • The Watson Query Manager for z/OS® data source now supports push down of predicates.

For more information, see Supported data sources in Watson Query.

A Watson Query connection is now available in the Platform connections by default

You can add a Watson Query connection from Platform connections to catalogs and projects without manually populating the connection details.

Manage access for multiple users and roles if you are a Manager

As a Watson Query Manager, you can now grant and revoke access for multiple users, and roles at the same time.

For more information, see Managing access to virtual objects in Watson Query.

Watson Query managers can now make virtual objects visible to all users

Managers can now choose to give users a more comprehensive view of the content by making existing virtual objects visible from the Virtualized data page. Data access within those objects continues to adhere to Watson Query authorizations and data protection rules. To enable this feature, managers need to disable the Restrict visibility setting from Service settings.

For more information, see Managing visibility of virtual objects in Watson Query.

Steward role no longer holds DATAACCESS database authority

Instead, Steward role now gets more restricted SELECTIN authority on all user-defined schemas.

New caching APIs

Cache entries can be managed through REST APIs that the caching service exposes. These APIs can be invoked by any application. You can use new caching APIs to do the following tasks:

  • Create a cache
  • List a specific cache
  • Delete a cache
  • Enable a cache
  • Disable a cache
  • Refresh a cache
  • Edit a cache

The following caching APIs are deprecated:

  • List caches
  • List a cache
  • Fetch the cache storage

For more information, see Caches in the Watson Query 2.0.0 API docs.

New publishing API

You can publish virutalized data to catalogs by using the following API:

The following API is deprecated:

Week ending 1 December 2023

New plans for Watson OpenScale as part of watsonx.governance

1 Dec 2023

Watson OpenScale is now part of watsonx.governance. Provisioning watsonx.governance from the IBM Cloud Catalog installs the Watson OpenScale. On Cloud Pak for Data as a Service, Watson OpenScale continues to provide services for evaluating predictive machine learning models. On the watsonx, provisioning watsonx.governance extends the governance capabilities of Watson OpenScale to evaluate foundation model assets as well as machine learning assets. You can define AI use cases to address business problems, then track asset data in factsheets to support compliance and governance goals. Watsonx.governance plans and features are available only in the Dallas region. Watson OpenScale legacy plans are available in the Frankfurt region.

IBM Watson Knowledge Catalog is now IBM Knowledge Catalog

1 Dec 2023

IBM Watson Knowledge Catalog is renamed to IBM Knowledge Catalog. Only the name changed, the service offering plans and product capabilities remain the same.

New data sources for metadata import in IBM Knowledge Catalog

1 Dec 2023

You can import metadata to IBM Knowledge Catalog from the following data sources:

  • IBM Match 360
  • SingleStoreDB

For more information, see Supported data sources for metadata import, metadata enrichment, and data quality rules.

Week ending 17 November 2023

New custom property of type user and user group

17 Nov 2023

You can now create a custom property of type user and user group and assign specific users or user groups to it. For more information, see Creating custom properties.

Multiple sources on either end of a custom relationship type

17 Nov 2023

You can extend your set of custom relationship types by using multiple types on the source and target end. Use many artifact, asset and column types for more detailed relationship definition. For more information, see Creating custom relationships.

New permissions for data quality in IBM Knowledge Catalog

17 Nov 2023

You can now assign the following permissions to your users to have more control over how data quality is established in IBM Knowledge Catalog:

  • Manage data quality assets
  • Execute data quality rules
  • Drill down to issue details

By default, the new permissions are included in the following roles:

  • Administrator
  • CloudPak Data Quality Analyst, which is a new role

Update role assignments and any custom roles you might have for users who need to manage data quality definitions and rules and to run data quality rules.

For more information, see User roles and permissions for IBM Knowledge Catalog and Watson Studio.

Export and import data protection rules

17 Nov 2023

You can now use APIs to export and import data protection rules across multiple instances of Cloud Pak for Data as a Service. The links to glossary artifacts, catalogs, assets, and users are maintained when you export the data protection rules.

For more information, see Migrating data protection rules.

Run DataStage flows in Extract, Load, and Transform (ELT) run mode (Beta)

13 Nov 2023

The ELT process is different from the traditional Extract, Transform, and Load (ETL) process in that it runs the transform part of the process in the target database, which can be more efficient and cost effective. This capability is currently offered in beta and is not supported for production.

Removal of some predefined relationship types (13 December 2023)

13 Nov 2023

On 13 December 2023, predefined relationship types for asset-asset and asset-artifact relationships that are infrequently used will be removed.

The following relationship types will be affected:

  • Defines - Is defined by will be replaced by Contains - Is contained in
  • Is owner of - Is owned by will be replaced by Contains - Is contained in
  • Has for parent entity - Is relationship child of will be replaced by Is parent of - Is child of
  • Is supertype of - Is subtype of will be replaced by Is parent of - Is child of

Here's what you need to do now:

  • If you are not using these relationship types, no action is required.
  • If you are using these relationship types and agree with the replacement relationship types, no action is required.
  • If you are using these relationship types and would like to assign different relationship types, remove the current relationship and create new relationships using other predefined or custom relationship types.

If you have any questions or concerns related to the replacement of these relationship types, you can open a support ticket.

Week ending 10 November 2023

Removal of resource key from the details panel for columns

10 Nov 2023

Resource key was displayed in the details panel at a column level although the information was not applicable for columns. Resource key is now removed from the details panel at a column level. The information is still required at an asset level. For example, the asset resource key might be used in the import lineage mapping CSV file.

Deploy DataStage remote runtime engines locally with DataStage-aaS Anywhere

9 Nov 2023

You can now deploy DataStage remote runtime engines to run data integration jobs on-premises or on any data center or cloud.

The DataStage runtime engine is a containerized offering that is deployed in local environments for enhanced performance and security. Design ETL and ELT pipelines in DataStage and run data integration tasks locally on your engine. Administrators can spin up one or more remote runtime engines. For security, the execution style cannot be reverted back to the IBM Cloud serverless runtime once DSaaS Anywhere is enabled for a project, but the IBM Cloud serverless runtime remains available for other projects.

For more information, see DataStage environments.

Announcing support for Python 3.10 and R4.2 frameworks and software specifications on runtime 23.1

9 Nov 2023

You can now use IBM Runtime 23.1, which includes the latest data science frameworks based on Python 3.10 and R 4.2, to run Watson Studio Jupyter notebooks and R scripts, train models, and run Watson Machine Learning deployments. Update your assets and deployments to use IBM Runtime 23.1 frameworks and software specifications.

Use Apache Spark 3.4 to run notebooks and scripts

Spark 3.4 with Python 3.10 and R 4.2 is now supported as a runtime for notebooks and RStudio scripts in projects. For details on available notebook environments, see Compute resource options for the notebook editor in projects and Compute resource options for RStudio in projects.

Week ending 27 October 2023

Access data from complex flat files in DataStage

27 Oct 2023

You can now use the Complex Flat File connector in your DataStage flows.

For the full list of DataStage connectors, see Supported data sources in DataStage.

Connect to more data sources in DataStage

27 Oct 2023

You can now include data from these data sources in your DataStage flows:

  • Apache Derby
  • IBM Cloud Data Engine
  • IBM Cloud Databases for DataStax
  • IBM watsonx.data

For the full list of DataStage connectors, see Supported data sources in DataStage.

Use a Satellite Connector to connect to an on-prem database

26 Oct 2023

Use the new Satellite Connector to connect to a database that is not accessible via the internet (for example, behind a firewall). Satellite Connector uses a lightweight Docker-based communication that creates secure and auditable communications from your on-prem environment back to IBM Cloud. For instructions, see Connecting to data behind a firewall.

Secure Gateway is deprecated

26 Oct 2023

IBM Cloud announced the deprecation of Secure Gateway. For information, see the Overview and timeline.

If you currently have connections that are set up with Secure Gateway, plan to use an alternative communication method. In Cloud Pak for Data as a Service, you can use the Satellite Connector as a replacement for Secure Gateway. See Connecting to data behind a firewall.

Use NLS collate in DataStage

27 Oct 2023

You can now collate data with National Language Support in your DataStage flows.

Week ending 20 October 2023

Access lakehouse data with the new IBM watsonx.data connection

20 Oct 2023

You can use the IBM watsonx.data connection to connect to a database in a watsonx.data instance that is deployed on Cloud Pak for Data or IBM Cloud. IBM watsonx.data is an open, hybrid and governed data lakehouse that is optimized by a query engine for all data and AI workloads.

For information, see IBM watsonx.data connection.

Week ending 13 October 2023

Custom enumeration property names translated into your preferred language (IBM Knowledge Catalog)

13 Oct 2023

Custom property owners can now allow custom enumeration type property names to be translated into your preferred language.

The owner of the custom enumeration type property for an asset or column must define the definition of the property before you can choose to view custom enumeration property names in your browser's language. For more information, see Creating custom properties.

Intermediate solutions in Decision Optimization

12 Oct 2023

You can now choose to see a sample of intermediate solutions while a Decision Optimization experiment is running. This can be useful for debugging or to see how the solver is progressing. For large models that take longer to solve, with intermediate solutions you can now quickly and easily identify any potential problems with the solve, without having to wait for the solve to complete. Graphical display showing run statistics with intermediate solutions. You can configure the Intermediate solution delivery parameter in the Run configuration and select a frequency for these solutions. For more information, see Intermediate solutions and Run configuration parameters.

New Decision Optimization saved model dialog

When you save a model for deployment from the Decision Optimization user interface, you can now review the input and output schema, and more easily select the tables that you want to include. You can also add, modify or delete run configuration parameters, review the environment, and the model files used. All these items are displayed in the same Save as model for deployment dialog. For more information, see Deploying a Decision Optimization model by using the user interface.

Deprecation of profiling of unstructured data (IBM Knowledge Catalog)

10 Oct 2023

As of today, data assets that contain unstructured data can no longer be profiled.

View runtime metrics for your DataStage jobs

9 Oct 2023

You can now view runtime metrics for your DataStage jobs on the canvas and on the job run details page. For more information, see Creating and managing DataStage jobs.

Bulk add keys and attributes to new stages

9 Oct 2023

You can now bulk add keys and attributes to the following stages in your DataStage flows: Sort, Merge, Join, Remove duplicate, Difference, Change capture, Change apply, Combine records, Funnel, Compare, Lookup file set, Write range map, and Bloom filter.

Week ending 6 October 2023

Control the placement of a new column in the Concatenate operation (Data Refinery)

6 Oct 2023

You now have two options to specify the position of the new column that results from the Concatenate operation: As the right-most column in the data set or next to the original column.

Concatenate operation column position

Previously, the new column was placed at the beginning of the data set.

Important:

Edit the Concatenate operation in any of your existing Data Refinery flows to specify the new column position. Otherwise, the flow might fail.

For information about Data Refinery operations, see GUI operations in Data Refinery.

Week ending 29 September 2023

Use new functions in the expression builder for the Modify stage in DataStage

25 Sept 2023

You can use conversion functions in the expression builder in the Modify stage in your DataStage flows.

Week ending 22 September 2023

Decision Optimization Java models

20 Sept 2023

Decision Optimization Java models can now be deployed in Watson Machine Learning. By using the Java worker API, you can create optimization models with OPL, CPLEX, and CP Optimizer Java APIs. You can now easily create your models locally, package them and deploy them on Watson Machine Learning by using the boilerplate that is provided in the public Java worker GitHub. For more information, see Deploying Java models for Decision Optimization.

Week ending 8 September 2023

Reminder: Watson Knowledge Catalog profiling of unstructured data will be discontinued

8 Sept 2023

Profiling of unstructured data assets will no longer be supported starting on October 10, 2023.

Week ending 1 September 2023

Deprecation of comments in notebooks

31 Aug 2023

As of today it is not possible to add comments to a notebook from the notebook action bar. Any existing comments were removed.

Comments icon in the notebook action bar

Use new environment variable in DataStage

28 Aug 2023

You can now add the environment variable APT_SHOW_METRICS to the flow parameters of your DataStage flows.

Week ending 25 August 2023

Quickly find catalogs with name and date sorting

24 Aug 2023

You can now find catalogs by sorting the list of catalogs on the View all Catalogs page by name or date created. Click on the Name header to sort the catalogs alphabetically by name. Click on the Date created header to sort the catalogs by ascending or descending dates.

Data quality at a glance in IBM Knowledge Catalog

22 Aug 2023

Data quality information has a new home. For each data asset in a catalog or a project, a Data quality page is populated with quality information that comes from predefined data quality checks and data quality rules. You can see the applicable data quality dimensions and the results of individual quality checks. You can drill down into the results for each check or even into the results for each column.

Data quality tab in catalogs and projects

For more information, see Data quality.

Similar information is available from metadata enrichment results.

All data quality analysis is now run in the context of metadata enrichment or data quality rules. When you run profiling from the Profile page in a project or a catalog, data quality is not analyzed anymore and no data quality scores are generated.

Additional cache enhancements available for Watson Pipelines

21 August 2023

More options are available for customizing your pipeline flow settings. You can now exercise greater control over when the cache is used for pipeline runs. For details, see Managing default settings.

Week ending 18 August 2023

Plan name updates for Watson Machine Learning service

18 August 2023

Starting immediately, plan names are updated for the IBM Watson Machine Learning service, as follows:

  • The v2 Standard plan is now the Essentials plan. The plan is designed to give your organization the resources required to get started working with foundation models and machine learning assets.

  • The v2 Professional plan is now the Standard plan. This plan provides resources designed to support most organizations through asset creation to productive use.

Changes to the plan names do not change your terms of service. That is, if you are registered to use the v2 Standard plan, it will now be named Essentials, but all of the plan details will remain the same. Similarly, if you are registered to use the v2 Professional plan, there are no changes other than the plan name change to Standard.

For details on what is included with each plan, see Watson Machine Learning plans. For pricing information, find your plan on the Watson Machine Learning plan page in the IBM Cloud catalog.

Connect to more data sources in DataStage

18 Aug 2023

You can now include data from these data sources in your DataStage flows:

  • Cloudera Impala
  • Presto

For the full list of DataStage connectors, see Supported data sources in DataStage.

Connect to Google BigQuery data with ODBC (DataStage)

18 Aug 2023

The ODBC connection now includes the Google BigQuery data source.

For the full list of data sources that are available for the ODBC connection in DataStage, see ODBC connection.

Week ending 11 August 2023

Use new functions in the DataStage Transformer stage

8 August 2023

  • You can now use data masking, encryption, and regex functions in the Transformer stage as part of your DataStage flows.
  • You can now drag and drop columns on the Output tab of the Transformer stage.
  • You can now bulk edit columns in the Transformer stage from the Input tab.

Deprecation of comments in notebooks

7 August 2023

On 31 August 2023, you will no longer be able to add comments to a notebook from the notebook action bar. Any existing comments that were added that way will be removed.

Comments icon in the notebook action bar

Week ending 4 August 2023

Custom text analytics template (SPSS Modeler)

4 August 2023

For SPSS Modeler, you can now upload a custom text analytics template to a project. This provides you with more flexibility to capture and extract key concepts in a way that is unique to your context.

Week ending 28 July 2023

Enhanced capabilities for evaluating models with Watson OpenScale

25 July 2023

Use these new features to monitor and evaluate model deployments and interpret results.

Configure deployments with a new guided setup

A new setup wizard is available to help you add deployments to the Watson OpenScale Insights dashboard and provide model details. For more information, see Adding deployments for evaluations.

Configure new drift evaluation to provide more insights

You can configure a new version of the drift evaluation in Watson OpenScale to generate the following new metrics:

  • Output drift
  • Feature drift
  • Model quality drift

For more information, see Configuring drift v2 evaluations.

Understand model performance with model health evaluations

Watson OpenScale now provides new model health evaluations by default to help you understand how efficiently your model processes your transactions. For more information, see Model health monitor evaluation metrics.

Add multi-target prediction models in Watson OpenScale

When you add your deployments in Watson OpenScale, you can now specify multiple prediction columns to provide details about your models output to configure quality evaluations. For more information, see Providing model details.

Run fairness evaluations with unstructured data

You can now enable fairness evaluations on unstructured data types to identify bias. For more information, see Configuring fairness evaluations.

Week ending 14 July 2023

Manage asset column relationships in a catalog

14 July 2023

Admins can now create and manage asset column relationships in a catalog. Column relationships can be created between columns and assets, columns and artifacts, or between columns.

To add a column relationship, click a column row on the Overview page of an asset. In the side pane, click the Related items overflow menu. Select one of the relationship types from the dropdown to add a relationship.

To learn more about creating relationships, see Asset relationships in a catalog.

Deprecation of the profiling support for unstructured data in IBM Knowledge Catalog

12 July 2023

Profiling of data assets that contain unstructured data, such as Microsoft Word, PDF, HTML, and plain text documents, is deprecated. Support will be discontinued on 10 October 2023. Until then, unstructured data assets of the supported types will continue to be profiled automatically when added to a project or a catalog. Starting on 11 October 2023, newly added unstructured data assets will no longer be profiled. Existing profiles will be available while the respective data assets live in the project or catalog.

Microsoft Azure SQL Database connection supports Azure Active Directory authentication (Azure AD)

14 July 2023

You can now select Active Directory for the Microsoft Azure SQL Database connection. Active Directory authentication is an alternative to SQL Server authentication. With this enhancement, administrators can centrally manage user permissions to Azure. For more information, see Microsoft Azure SQL Database connection.

Week ending 7 July 2023

Switch to IBM watsonx.ai

7 July 2023

If you have the Watson Studio and Watson Machine Learning services, you now have access to IBM watsonx.ai. You can switch from Cloud Pak for Data as a Service to watsonx and work with foundation models in the Prompt Lab tool or in notebooks.

See Switching between platforms.

Updates to Watson Machine Learning plans

7 July 2023

All Watson Machine Learning plans now include foundation model inferencing. Foundation model inferencing is available only on watsonx.ai. You can switch to watsonx.ai and use the new Prompt Lab tool or access foundation models with a notebook. You use the same Watson Machine Learning service instance on watsonx.ai as you use on Cloud Pak for Data as a Service.

If you have the Watson Machine Learning Lite plan, you can use up to 25,000 tokens for foundation model inferencing per month.

If you have the Watson Machine Learning v2 Standard or v2 Professional plan, your account will incur charges when your account users perform foundation model inferencing in the Prompt Lab or in notebooks.

For details on how foundation model inferencing is tracked and billed, see Watson Machine Learning plan. For the pricing of foundation model inferencing, find your plan on the Watson Machine Learning plan page in the IBM Cloud catalog.

Enhanced Natural Language Processing capabilities in Runtime 23.1

7 July 2023

Runtime 23.1 contains the Watson Natural Language Processing library 4.1 and a new set of pre-trained models. The NLP library contains the following enhancements and updates:

  • Many included models are now transformer-based. These models were trained on the Slate large language model (LLM), which was created by IBM. The models are available in two versions:
    • Optimized for CPU-only environments
    • For environments with GPUs or CPUs
  • Many included models for different NLP tasks are now workflow-based instead of block-based, so you can apply the models directly on input text without worrying about preprocessing steps.

NLP includes a Slate foundation model that you can use for fine-tuning your NLP tasks. You can use the Slate model or any transformer-based model from Hugging Face as a base to build your own models with Watson NLP.

All models provided by IBM are now exclusively trained on unbiased data with state-of-the-art filtering for hate, bias, and profanity.

These capabilities are currently available in the following environments:

  • NLP Runtime 23.1 on Python 3.10
  • GPU V100 Runtime 23.1 on Python 3.10
  • GPU 2xV100 Runtime 23.1 on Python 3.10

You can use these environments for NLP processing, but not for general model development. The data science libraries used in these environments are not yet supported by Watson Machine Learning.

For more information, see Watson Natural Language Processing.

Week ending 30 June 2023

Enhanced Data Privacy content in Knowledge Accelerators (IBM Knowledge Catalog)

28 June 2023

The Knowledge Accelerator for Cross Industry now has Data Privacy content that includes a set of classified business terms and data classes to accelerate the discovery and governance of personal information. In addition, sample data privacy policies and rules are available to describe the activities that are related to processing personal information.

The business terms and data classes have classifications to guide the identification of personal information (PI) and sensitive personal information (SPI). You can use metadata enrichment in IBM Knowledge Catalog to assign the business terms to imported data assets to identify assets that contain personal data.

See IBM Knowledge Accelerator for Cross Industry.

Reporting now available for custom assets (IBM Knowledge Catalog)

28 June 2023

You can now create queries, reports, and dashboards based on custom-defined properties for any asset in a project or in a catalog. You can define new custom properties for assets to extend any provided or custom asset types and then create reports based on these relationships. For example, you can create a report on your data quality rules and artifact relationships to extrapolate the accuracy of your data. For more information, see Setting up reporting.

Reporting improvements for data quality rules (IBM Knowledge Catalog)

28 June 2023

You can now monitor data quality rules in the following ways:

  • Receive and manage reports on data quality issues for each data asset in a catalog or a project.
  • Monitor ongoing data quality for data assets in projects and catalogs by using reporting for data quality scores and data quality dimensions scores. The data quality score is based on a weighted average from data quality dimension scores. The data quality dimensions scores are based on results from relevant data quality checks.
  • For data quality rules that include multiple rule definitions, see the data quality check statistics (results) by rule definition in the BI reporting schema.

For more information, see Data model.

Week ending 23 June 2023

Govern models more effectively with enhancements for AI Factsheets

23 June 2023

AI Factsheets now offers more ways for you to track solutions for business problems, govern a wider range of assets, capture more information with factsheet attachments, and generate improved reports.

Track different model use case solutions with approaches

When you track models in a use case, you can now create one or more approaches to track different methods and model versions for addressing a business problem. For example, you might create two different approaches in a use case to compare how different algorithms affect model performance so you can find the best solution. For details, see Managing model versions in a use case.

Enhanced options for governing external models

You can now use AI Factsheets to govern a wider range of external models, including models developed, deployed, and monitored on a platform other than Cloud Pak for Data as a Service. In addition to more comprehensive metadata tracked for external models, the Python client and API commands provide more features for moving models and deployments to different environments to more accurately track the life cycle for these assets. For details, see Adding an external model to the model inventory.

Exercise more control over attachments

Model inventory administrators can create attachment groups and create attachment definitions so that users can view attachments in a more organized fashion and upload attachments in an approved format. For details, see Adding and managing attachments for factsheets.

Add branding to your AI Factsheets reports

Customize the report templates that you use to create reports from factsheets by adding branding information and a logo. For more information, see Generating reports for factsheets and model use cases. For details, see Generating reports for factsheets and model use cases.

Announcing support for Python 3.10 Spark 3.3 runtime for notebooks (Watson Studio)

23 June 2023

Python 3.10 Spark 3.3 is now supported as a runtime for notebooks. Python 3.9 Spark 3.3 is deprecated and will be discontinued on July 20, 2023. Starting on July 6, 2023, you will be restricted from creating notebooks with a Python 3.9 Spark 3.3 environment, but existing notebooks will continue to run until July 30, 2023. Change your notebook environment to use Python 3.10 Spark 3.3 before the deprecated environment is removed. For details on notebook environments, see Compute resource options for the notebook editor in projects.

Week ending 16 June 2023

Coming soon: General availability of time series anomaly prediction in AutoAI experiments

15 June 2023

Create a time series anomaly prediction experiment to train a model that can detect anomalies, or unexpected results, when the model predicts results based on new data. This capability of AutoAI is currently offered in beta, and is not supported for production. Once the feature is generally available and fully supported, training for time series anomaly prediction experiments will consume capacity unit hours (CUH) as part of your Watson Machine Learning plan. For more details, see:

Customize engine parameters for Decision Optimization experiments (Watson Studio)

15 June 2023

You can now add an engine settings file in your Decision Optimization experiment. With this file, you can view and customize the engine parameters that are used to solve your model in a new visual editor. You can also import an engine settings file and search for existing settings.

Engine settings .ops file shown open in Visual Editor view with one customised parameter

See Python model engine settings.

Week ending 2 June 2023

Manage AI lifecycle events with the cpdctl tool

2 June 2023

You can now manage and automate your assets hosted on Cloud Pak for Data as a Service using the Cloud Pak for Data Command Line Interface tool (cpdctl). Use automatic configuration from IBM Cloud to easily connect with the cpdctl API commands. For details and an example, see these resources:

Week ending 19 May 2023

Reminder: End of support approaching for Runtime 22.1 on Python 3.9 and R 3.6

15 May 2023

IBM Runtime 22.1 on Python 3.9 and R 3.6 environments will be removed on June 15, 2023. You can no longer create new notebooks or create custom environments using the 22.1 runtimes or R 3.6, or train new models with Python 3.9 software specifications. Update your assets and deployments to use IBM Runtime 22.2 on Python 3.10 or R 4.2 before June 15, 2023.

Introducing key-value search for advanced users

18 May 2023

Using key:value pairs in the search bar, you can now search within asset and artifact properties, such as the description, tags, custom properties, column names, and many more. See Searching for properties.

Name change for the IBM Cloud Compose for MySQL connection

18 May 2023

The IBM Cloud Compose for MySQL connection was renamed to IBM Cloud Databases for MySQL. Your previous settings for the connection remain the same. Only the connection name has changed.

Discontinued connections

18 May 2023

The following connections are discontinued and have been removed from Cloud Pak for Data as a Service:

  • IBM Db2 Event Store
  • IBM Db2 Hosted

Renaming data assets also renames file attachments in projects

19 May 2023

When you change the name of data assets with file attachments that you uploaded into the project, the file attachments are also renamed. However, changing the name of data assets imported from catalogs does not rename any attachments. You must update any references to the data asset in code-based assets, like notebooks, to the new data asset name, otherwise, the code-based asset won't run. See more information about Managing assets in projects.

Week ending 12 May 2023

New UI capabilities for creating custom assets and managing custom properties for columns

11 May 2023

Catalog collaborators with the Admin or Editor role can now complete the following tasks from the web client:

  • Create custom assets from the catalog. To add a custom asset, select Custom asset from the Add to catalog drop-down menu.
  • Manage custom properties for data asset columns. To manage custom properties, select a column in the Overview of an asset and edit the properties in the side pane.

To learn more about custom properties for data assets, see Custom asset types, properties, and relationships.

Week ending 5 May 2023

Add generated code from the Code snippets pane

4 May 2023

A new Code snippets icon was added to the notebook toolbar. Clicking the icon, opens the Code snippets pane from where you can read data from a file or connection that was added to the project. The existing "Insert to code" function logic for generating code that loads data to a notebook cell has been moved under Read data. The former Find and load data pane can now only be used to upload data to a project. See Loading and accessing data in a notebook.

Week ending 28 April 2023

Orchestration Pipelines now generally available for automating AI lifecycle activities

27 Apr 2023

Orchestration Pipelines provides a graphical interface for orchestrating an end-to-end flow of assets from creation through deployment. Assemble and configure a pipeline that automates the tasks around curating data, then training, deploying, and updating machine learning models. Run a pipeline job in real time or on a schedule. For details on creating pipelines, see Orchestration Pipelines.

New in this update is the ability to create a custom pipeline component to execute a script you write using a Python function. You can use custom components to share reusable scripts between pipelines. You create custom components as project assets and then use them in pipelines you create in that project. For details, see Creating a custom component.

Orchestration Pipelines is offered as a feature of Watson Studio. However, you must have service plans for the assets and processes used in a pipeline. For example, to run a DataStage flow in a pipeline, you must have a Data Stage service instance. Orchestration Pipelines consumes resources based on the assets and processes used in the pipeline. If your pipeline trains an AutoAI model, your account is charged for the Watson Machine Learning capacity units per hour (CUH) used for training the model. Likewise if a pipeline contains a DataStage flow, the execution of that flow within Orchestration Pipelines is charged to your DataStage plan. Running pipeline components and bash scripts consume Watson Studio CUH resources. For details on provisioning service instances and plans, see Services and integrations.

Access more data with the new Presto connection

27 Apr 2023

You can now work with data from Presto data sources. For information, see Presto connection.

Week ending 21 April 2023

Drill down into the details of profiling results (IBM Knowledge Catalog)

20 Apr 2023

You can now access detailed profiling information from within a metadata enrichment or from an asset’s Profile tab in a project or a catalog. For each column, view statistical information about the column data, information about data classes, data types and formats, and the frequency distribution of values in the column. For the statistical information, you can also choose between several types of visualizations. To populate these views for an existing profile, update the profile.

Statistical information for continuous data

Statistical information for nominal data

For details, see Column-level profile details.

Week ending 14 April 2023

Default Python and CPLEX versions updated (Decision Optimization)

13 Apr 2023

The default Python for Decision Optimization users is now 3.10 and the default CPLEX version is 22.1. These versions are used by default when you create a new experiment. Python 3.9 is deprecated and will soon be removed. To update your environment, see Configuring Environments. To update existing deployed models, see Model deployment.

Enhancements to data quality rules (IBM Knowledge Catalog)

13 Apr 2023

You can now also run data quality rules on data assets from these data sources:

  • Amazon S3 (CSV files only)
  • Apache Cassandra
  • SAP ASE

When you configure a data quality rule with externally managed bindings, you can now select additional content for output links in the associated DataStage flow. For more information, see Creating rules from data quality definitions.

Week ending 7 April 2023

New: Time Series anomaly detection experiment (Beta)

7 Apr 2023

Use AutoAI to train a time series anomaly prediction model that can detect anomalies, or unexpected results, when the model predicts results based on new data. Model candidate pipelines generated by the experiment are ranked according to how well they perform measured by the optimizing metric. Save a model as a notebook to review the code, or save and deploy a model to detect potential anomalies in new data. For details, see Creating a time series anomaly prediction model (Beta). This feature is offered as beta and is not yet supported for use in production environments.

Filter your asset activity in a project

6 Apr 2023

In the Assets pane on the Overview tab of a project, you can filter assets by selecting By you or By all using the dropdown. By you lists assets edited by you, ordered by most recent at the top. By all lists assets edited by others and also by you, ordered by most recent at the top.

Upgrade to Spark with R 4.2 in Watson Studio

3 Apr 2023

Spark R 3.6 environments for Watson Studio are upgraded to R 4.2. All Spark R 3.6 environments are now deprecated and will be removed on 15 June 2023. Starting on 11 May 2023, you can no longer create new notebooks or new Data Refinery flows with Spark R 3.6. Additionally, you will not be able to create new Spark R 3.6 custom environments. At that time, you might need to update some package versions and scripts for your notebooks. You must update your assets and deployments to use Spark with R 4.2 before 15 June 2023.

See Changing the environment for a notebook. For details on the libraries and packages for R versions, see the CRAN release notes.

New Spark with R 4.2 environment for running Data Refinery flow jobs

3 Apr 2023

You can now select Default Spark 3.3 & R 4.2 when you select an environment for a Data Refinery flow job. The new environment uses the same capacity unit hours (CUHs) as the other Default environments.

Spark 3.3 & R 3.6 selection

Important:

The Default Spark 3.2 & R 3.6 environment is deprecated and will be discontinued in a future update. Change your Data Refinery flow jobs to use the new Default Spark 3.3 & R 3.6 environment.

For information about environments for Data Refinery, see Compute resource options for Data Refinery in projects.

The environment change affects two GUI operations. If you have existing Data Refinery flows that include these GUI operations, you must update the Data Refinery flow.

  • Split
  • Tokenize

To update a flow, open it, save it. For details, see Managing Data Refinery flows.

Week ending 31 March 2023

Create custom assets from a catalog

31 Mar 2023

Admins and editors can now create custom assets inside the Catalog UI. To add a new custom asset, select Custom asset from the Add to catalog dropdown menu. To learn more about custom assets, see Custom asset types, properties, and relationships in Adding assets to a catalog (Watson Knowledge Catalog).

Improvements and enhancements in Watson Query

29 Mar 2023

Watson Query has been updated to provide the following capabilities:

  • With asynchronous virtualization, you can view the status details of a virtualization job any time on the Virtualized data page. If the virtualized tables are large and the job takes longer, you can work on other tasks, such as virtualizing more tables, while the job finishes.
  • With asynchronous publishing and assignments on the Virtualized data page, you can work on other tasks while the publishing and assignment jobs finish.
  • You can use jobs in the web client to collect statistics on virtualized tables. For more information, see Collecting statistics in the web client in Watson Query.
  • You can view the publishing or assignment history of an object on the Virutualized data page. Click an object row from the list to view its publishing and assignment history in the right side panel of the Virutualized data page.

Week ending 24 March 2023

Federated Learning runs on Mac computers with M-series chips

23 Mar 2023

Run your Federated Learning experiments on M1 Mac and M2 Mac computers in the latest runtime. For requirements, see Set up your system.

Week ending 17 March 2023

Define composite keys in reference data sets (IBM Knowledge Catalog)

17 Mar 2023

You can now specify multiple columns to create a composite key for your reference data sets. Without a composite key, reference data values in a set are identified by a unique string in the code column. A composite key is a combination of the code column and up to 5 custom columns in a reference data set. A composite key is used to uniquely identify each reference data value. With a composite key, the values in the code column no longer need to be unique. Uniqueness is guaranteed only when the values of all the specified columns are combined. For details, see Designing reference data sets.

Week ending 10 March 2023

Create queries, reports, or dashboards based on custom relationships (IBM Knowledge Catalog)

9 Mar 2023

When you create custom relationships between assets and governance artifacts, you can sync them to IBM Knowledge Catalog Reporting Data Mart, so that you can create reports. For example, you can use the custom relationships reporting to:

  • Obtain quality analytics at various levels of granularity (by domain, by metadata, by user, by team)
  • Certify the data quality of your data
  • Count the number of assets that have a specific privacy property

To learn how to create custom relationships, see Custom properties and relationships for governance artifacts and catalog assets (IBM Knowledge Catalog).

To learn how to create reports, see Setting up reporting for IBM Knowledge Catalog.

Runtime 22.1 on Python 3.9 deprecation for Watson Studio and Watson Machine Learning

9 Mar 2023

IBM Runtime 22.1 on Python 3.9 is now deprecated and will be removed on Jun 15, 2023. Starting on May 11, 2023, you can no longer create new notebooks or create custom environments using the 22.1 runtimes. You will also be unable to train new models with Python 3.9 software specifications. Update your assets and deployments to use IBM Runtime 22.2 on Python 3.10 before June 15, 2023:

Run data quality rules on additional data sources (IBM Knowledge Catalog)

9 Mar 2023

You can now run data quality rules on data assets from these data sources:

  • IBM Data Virtualization
  • Microsoft Azure Data Lake Storage
  • Snowflake

New option for binding variables in data quality rules (IBM Knowledge Catalog)

9 Mar 2023

You can now also use job parameters to bind rule variables to data columns and manage those parameters centrally in a project. Thus, you don’t need to update the rules when, for example, you want to change the binding to a different column. See Creating rules from data quality definitions.

Week ending 3 March 2023

Enhancements for AI Factsheets (Watson Machine Learning)

3 March 2023

You can now attach files and images to a factsheet. For details, see Customizing details for a factsheet. Factsheets also display additional Watson OpenScale metrics from explainability and custom monitors. For details, see Viewing factsheets.

Create, store, and share machine learning features (Beta) (Watson Studio)

2 March 2023

You can now speed the development of machine learning models by creating and sharing features. You add a feature group to a data asset in a project to identify the features of that data set. You can share the features with your organization by publishing the data asset to a catalog, which acts as a feature store. See Managing feature groups.

Week ending 24 February 2023

Manage custom relationships (IBM Knowledge Catalog)

24 February 2023

Now, you can manage custom relationships between catalog assets and governance artifacts in the Overview page of an asset.

To learn how to create custom relationships, see Custom properties and relationships for governance artifacts and catalog assets (IBM Knowledge Catalog).

Week ending 17 February 2023

Data Refinery Calculate operation works on Date columns

17 Feb 2023

You can now use the Calculate operation on Date data type columns to add or subtract day or month values.

Data Refinery Calculate operation

For information about GUI operations, see GUI operations in Data Refinery.

New library to access project assets in Watson Studio

17 Feb 2023

The ibm-watson-studio-lib library contains a set of functions that help you to interact with Watson Studio projects and project assets. The library can be used in notebooks that are created in the notebook editor and is available for Python and R. It is the successor of the project_lib library. For details, see Using ibm-watson-studio-lib.

"Default Spark 3.2 & R 3.6 " environment discontinued (Data Refinery)

17 Feb 2023

The Default Spark 3.2 & R 3.6 environment will no longer be available effective February 17, 2023.

If you have any Data Refinery flow jobs set up with the Default Spark 3.2 & R 3.6 environment or a custom environment that uses Spark 3.0, the jobs will fail. Change the environment to Default Spark 3.3 & R 3.6 or Default Data Refinery XS or a custom environment that does not use Spark 3.0.

For information about environments for Data Refinery, see Compute resource options for Data Refinery in projects.

New features for data quality rules (IBM Knowledge Catalog)

16 Feb 2023

These new capabilities are available:

  • Use more than one data quality definition in a single data quality rule. In addition, you can include an individual definition more than once to apply the same definition to different columns. For details, see Creating rules from data quality definitions.
  • Download rule output as CSV file. If an output table is defined for the rule, you can now also download the rule output as a CSV file from the rule's run history, for example, for use in a spreadsheet program.
  • Run rules on data from Amazon Redshift and Greenplum data sources. See Supported data sources for metadata import, metadata enrichment, and data quality rules.
  • Export and import data quality assets. When you export project assets to desktop, you can now include data quality assets. See Exporting a project.

Week ending 10 February 2023

Import assets from a project or space into an existing space (Watson Machine Learning)

9 Feb 2023

You can now import a deployment space or a project (in .zip format) into an existing deployment space. Add assets or update existing assets to a space. For example, you can replace a model with a newer version. For details, see Importing spaces and projects into existing spaces.

Use more macros in DataStage

10 Feb 2023

You can add the DSJobController macro to stage properties or in the transformer functions.

The macro acts as DataStage function and outputs data without the need for arguments, simplifying the setup of DataStage jobs and flows.

For more information, see Macros.

Week ending 3 February 2023

Use more macros in DataStage

6 Feb 2023

You can add the following macros to stage properties or in the transformer functions:

  • DSProjectId
  • DSJobRunId
  • DSJobId

The macros act as DataStage functions and output data without the need for arguments, simplifying the setup of DataStage jobs and flows.

For more information, see Macros.

Week ending 20 January 2023

Edit input columns in DataStage stages

20 Jan 2023

You can now edit columns through the input tab of a stage in DataStage. Your changes are propagated to the previous stage in the flow.

New options for metadata import (IBM Knowledge Catalog)

19 Jan 2023

To ensure that the target project or catalog of your metadata import doesn't contain stale data, you can now configure the import to clean up data assets that can't be reimported. Select to delete assets that are no longer available in the data source, that were removed from the import scope, or both from the import target when the metadata import is rerun. See Importing metadata.

Metadata import: new advanced options

Export data from Decision Optimization experiments to your project

18 Jan 2023

You can now export tables to your project from either the Prepare data or Explore solution view in your Decision Optimization experiment. This enables you to reuse your data in other models or services. You can also export data using the Decision Optimization Python client.

See Exporting data from Decision Optimization experiments.
Data export to project

Week ending 13 January 2023

Updated Data fabric use cases

12 Jan 2023

The Data fabric uses cases are updated to better reflect how you use our products:

  • Data integration: This use case now includes Pipelines.
  • Data governance: This use case now includes Match 360.
  • AI governance: This use case now focuses on monitoring, maintaining, automating, and governing AI models in production.
  • Data Science and MLOps: This new use case explains how to operationalize data analysis and model creation.

See Data fabric use cases.

Customize the web browser to support your brand

12 Jan 2023

As an administrator, you can add custom product names, logos, and other graphics to customize the branding of the web browser for Cloud Pak for Data as a Service.

See Customizing the branding of the web browser.

Week ending 6 January 2023

Connect to more data sources in DataStage

6 Jan 2023

You can now include data from these data sources in your DataStage flows:

  • Dremio
  • SingleStoreDB

For the full list of DataStage connectors, see DataStage connectors.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more