The core services for Cloud Pak for Data as a Service provide a range of tools for users with all levels of experience in preparing, analyzing, and modeling data, from beginner to expert. The right tool for you depends on the type of data you
have, the tasks you plan to do, and the amount of automation you want.
To see which tools you use in a project and which services those tools require, open the tools and services map.
To pick the right tool, consider these factors.
The type of data you have
Tabular data in delimited files or relational data in remote data sources
Image files
Textual (unstructured) data in documents
The type of tasks you need to do
Prepare data: cleanse, shape, visualize, organize, and validate data.
Analyze data: identify patterns and relationships in data, and display insights.
Build models: build, train, test, and deploy models to make predictions or optimize decisions.
How much automation you want
Code editor tools: Use to write code in Python or R, all also with Spark.
Graphical builder tools: Use menus and drag-and-drop functionality on a builder to visually program.
Automated builder tools: Use to configure automated tasks that require limited user input.
To use a tool, you must create an asset specific to that tool, or open an existing asset for that tool. To create an asset, click New asset or Import assets and then choose the asset type you want. This table
shows the asset type to choose for each tool.
This video provides a visual method to learn the concepts and tasks in this documentation.
Data Refinery
Copy link to section
Use Data Refinery to prepare and visualize tabular data with a graphical flow editor. You create and then run a Data Refinery flow as a set of ordered operations on data.
Required services
watsonx.ai Studio or IBM Knowledge Catalog
Data format
Tabular: Avro, CSV, JSON, Microsoft Excel (xls and xlsx formats. First sheet only, except for connections and connected data assets.), Parquet, SAS with the "sas7bdat" extension (read only), TSV (read only), or delimited text data
asset
Relational: Tables in relational data sources
Data size
Any
How you can prepare data
Cleanse, shape, organize data with over 60 operations.
Save refined data as a new data set or update the original data.
Profile data to validate it.
Use interactive templates to manipulate data with code operations, functions, and logical operators.
Schedule recurring operations on data.
How you can analyze data
Identify patterns, connections, and relationships within the data in multiple visualization charts.
Get started
To create a Data Refinery flow, click New asset > Prepare and visualize data.
This video provides a visual method to learn the concepts and tasks in this documentation.
Data Replication
Copy link to section
Use Data Replication to integrate and synchronize data. Data Replication provides near-real-time data delivery with low impact to sources.
Required service
Data Replication
Related service
IBM Knowledge Catalog
Data formats
Data Replication works with connections to and from select types of data sources and formats. For more information, see Supported Data Replication connections.
Credentials
Data Replication uses your IBM Cloud credentials to connect to the service.
Get started
To start data replication in a project, click New asset > Replicate data.
This video provides a visual method to learn the concepts and tasks in this documentation.
DataStage
Copy link to section
Use DataStage to prepare and visualize tabular data with a graphical flow editor. You create and then run a DataStage flow as a set of ordered operations on data.
Required service
DataStage
Data format
Tabular: Avro, CSV, JSON, Parquet, TSV (read only), or delimited text files
Relational: Tables in relational data sources
Data size
Any
How you can prepare data
Design a graphical data integration flow that generates Orchestrate code to run on the high performing, DataStage parallel engine.
Perform operations such as: Join, Funnel, Checksum, Merge, Modify, Remove Duplicates, and Sort.
Get started
To create a DataStage flow, click New asset > Transform and integrate data. The DataStage tile is in the Graphical builders section.
Watch a video to see how to build a model with SPSS Modeler
This video provides a visual method to learn the concepts and tasks in this documentation.
Decision Optimization model builder
Copy link to section
Use Decision Optimization to build and run optimization models in the Decision Optimization modeler or in a Jupyter notebook.
Required services
watsonx.ai Studio
Data formats
Tabular: CSV files
Data size
Any
How you can prepare data
Import relevant data into a scenario and edit it.
How you can build models
Build prescriptive decision optimization models.
Create, import and edit models in Python DOcplex, OPL or with natural language expressions.
Create, import and edit models in notebooks.
How you can solve models
Run and solve decision optimization models using CPLEX engines.
Investigate and compare solutions for multiple scenarios.
Create tables, charts and notes to visualize data and solutions for one or more scenarios.
Get started
To create a Decision Optimization model, click New asset > Solve optimization problems, or for notebooks click New asset > Work with data and models in Python or R notebooks.
Watch a video to see how to build an AutoAI experiment
This video provides a visual method to learn the concepts and tasks in this documentation.
Federated Learning
Copy link to section
Use the Federated Learning tool to train a common model using distributed data. The data is never combined or shared, preserving data integrity while providing all participating parties with a model based on the aggregated data.
Required services
watsonx.ai Studio
watsonx.ai Runtime
Data format
Any
Data size
Any size
How you can build models
Choose a training framework.
Configure the common model.
Configure a file for training the common model.
Have remote parties train their data.
Deploy the common model.
Get started
To create an experiment, click New asset > Train models on distributed data.
Use IBM Match 360 with Watson to create master data entities that represent digital twins of your customers. Model and map your data, then run the matching algorithm to create master data entities. Customize and tune your matching algorithm
to meet your organization's requirements.
Required services
IBM Match 360 with Watson IBM Knowledge Catalog
Data size
Up to 1,000,000 records (for the Beta Lite plan)
How you can prepare data
Model and map data from sources across your organization.
Run the customizable matching algorithm to create master data entities.
View and edit master data entities and their associated records.
Get started
To create an IBM Match 360 configuration asset, click New Asset > Consolidate data into 360-degree views.
Watch a video to see an overview of the RStudio IDE
This video provides a visual method to learn the concepts and tasks in this documentation.
Masking flows
Copy link to section
Use the Masking flow tool to prepare masked copies or masked subsets of data from the catalog. Data is de-identified using advanced masking options with data protection rules.
Required service
IBM Knowledge Catalog
Data format
Relational: Tables in relational data sources
Data size
Any size
How you can prepare data, analyze data, or build models
Import data assets from governed catalog to project.
Create masking flow job definitions to specify what data to mask with data protection rules.
Optionally subset data to reduce size of copied data.
Run masking flow jobs to load masked copies to target database connections.
This video provides a visual method to learn the concepts and tasks in this documentation.
Data visualizations
Copy link to section
Use data visualizations to discover insights from your data. By exploring data from different perspectives with visualizations, you can identify patterns, connections, and relationships within that data and quickly understand large amounts of
information.
Data format
Tabular: Avro, CSV, JSON, Parquet, TSV, SAV, Microsoft Excel .xls and .xlsx files, SAS, delimited text files, and connected data. For more information about supported data sources, see Connectors.
Data size
No limit
Get started
To create a visualization, click Data asset in the list of asset types in your project, and select a data asset. Click the Visualization tab, and choose a chart type.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Some tools perform the same tasks but have different features and levels of automation.
Jupyter notebook editor
Prepare data
Visualize data
Build models
Deploy assets
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
AutoAI
Build models
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
SPSS Modeler
Prepare data
Visualize data
Build models
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Decision Optimization
Build models
Visualize data
Deploy assets
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Data Refinery
Prepare data
Visualize data
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Orchestration Pipelines
Prepare data
Build models
Deploy assets
Automate the model lifecycle, including preparing data, training models, and creating deployments.
RStudio
Prepare data
Build models
Deploy assets
Work with R notebooks and scripts in an integrated development environment.
Federated learning
Build models
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deployments
Deploy assets
Monitor models
Deploy and run your data science and AI solutions in a test or production environment.
Catalogs
Catalog data
Governance
Find and share your data and other assets.
Metadata import
Prepare data
Catalog data
Governance
Import asset metadata from a connection into a project or a catalog.
Metadata enrichment
Prepare data
Catalog data
Governance
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Data quality rules
Prepare data
Governance
Measure and monitor the quality of your data.
Masking flow
Prepare data
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Governance
Governance
Create your business vocabulary to enrich assets and rules to protect data.
Data lineage
Governance
Track data movement and usage for transparency and determining data accuracy.
AI factsheet
Governance
Monitor models
Track AI models from request to production.
DataStage flow
Prepare data
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Data virtualization
Prepare data
Create a virtual table to segment or combine data from one or more tables.
OpenScale
Monitor models
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Data replication
Prepare data
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Master data
Prepare data
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
watsonx.ai Studio
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
watsonx.ai Runtime
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
IBM Knowledge Catalog
Discover, profile, catalog, and share trusted data in your organization.
DataStage
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
Data Virtualization
View, access, manipulate, and analyze your data without moving it.
Watson OpenScale
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Data Replication
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Match360 with Watson
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Manta Data Lineage
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Project
Where you work with data.
> Projects > View all projects
Catalog
Where you find and share assets.
> Catalogs > View all catalogs
Space
Where you deploy and run assets that are ready for testing or production.
> Deployments
Categories
Where you manage governance artifacts.
> Governance > Categories
Data virtualization
Where you virtualize data.
> Data > Data virtualization
Master data
Where you consolidate data into a 360 degree view.