Virtualizing data

Last updated: Mar 17, 2025

Virtualizing data with Data Virtualization

Use the Data Virtualization service to easily join data from different sources in one unified view, without manual changes, data movement, or replication.

The Data Virtualization service is a part of the data fabric.

Overview

With Data Virtualization, you can access physical data from multiple sources through a single semantic virtual layer. This virtual layer means that the data can be accessed, manipulated, and analyzed without the need to know its physical format or location, and without having to move or copy it.

Data Virtualization is part of the data fabric.

Prerequisites

If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see IBM Knowledge Catalog on Cloud Pak for Data.

Required service

Data Virtualization.

Before you can virtualize your data, you must create and deploy a service instance of Data Virtualization in Cloud Pak for Data as a Service. For more information, see Provision a service instance for Data Virtualization.

After you provision your service instance, you can set up your Data Virtualization instance. For more information, see Getting started with Cloud Pak for Data as a Service and Quick start: Virtualize data.

Related service

IBM Knowledge Catalog.

If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see Data governance.

Data formats

Data Virtualization works with connections to many types of data sources and formats. For more information, see Connecting to data sources in Data Virtualization.

Data size

Each data source defines its own data size limits. For more information, see Supported data sources in Data Virtualization.

Credentials

Data Virtualization uses your IBM Cloud credentials to connect to the service. You must have certain Data Virtualization roles to perform certain tasks. For more information, see Connecting and authenticating to the Data Virtualization service.

Getting started

To start using Data Virtualization, follow these high-level steps:

Open the Data Virtualization service.

In the Cloud Pak for Data navigation menu, select Data > Data virtualization.
Add your data sources to Data Virtualization.

Navigate to the Data sources page and then select Add connection to add connections. Data Virtualization supports dozens of relational and nonrelational data sources.
Virtualize the tables from the data source.

In the Virtualize page, select the tables that you want to virtualize and then select Add to cart > View cart to virtualize the tables.
Join the tables to create a unified view.

In the Virtualized data page, select the tables that you want to join and then select Join to join the objects.
Query the virtual objects.

Navigate to the Run SQL page to query your virtual objects using the built in SQL editor.
Consume the data using other Cloud Pak for Data services in the data fabric.

Consume virtual tables in projects, dashboards, data catalogs, and other applications. For more information, see Dashboard services.