Virtualizing data

Last updated: Mar 17, 2025
Virtualizing data with Data Virtualization

Use the Data Virtualization service to easily join data from different sources in one unified view, without manual changes, data movement, or replication.

The Data Virtualization service is a part of the data fabric.

Overview

With Data Virtualization, you can access physical data from multiple sources through a single semantic virtual layer. This virtual layer means that the data can be accessed, manipulated, and analyzed without the need to know its physical format or location, and without having to move or copy it.

Data Virtualization is part of the data fabric.

Prerequisites

If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see IBM Knowledge Catalog on Cloud Pak for Data.

Required service
Data Virtualization.

Before you can virtualize your data, you must create and deploy a service instance of Data Virtualization in Cloud Pak for Data as a Service. For more information, see Provision a service instance for Data Virtualization.

After you provision your service instance, you can set up your Data Virtualization instance. For more information, see Getting started with Cloud Pak for Data as a Service and Quick start: Virtualize data.

Related service
IBM Knowledge Catalog.

If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see Data governance.

Data formats
Data Virtualization works with connections to many types of data sources and formats. For more information, see Connecting to data sources in Data Virtualization.
Data size
Each data source defines its own data size limits. For more information, see Supported data sources in Data Virtualization.
Credentials
Data Virtualization uses your IBM Cloud credentials to connect to the service. You must have certain Data Virtualization roles to perform certain tasks. For more information, see Connecting and authenticating to the Data Virtualization service.

Getting started

To start using Data Virtualization, follow these high-level steps:
  1. Open the Data Virtualization service.
    In the Cloud Pak for Data navigation menu, select Data > Data virtualization.
    Screenshot of Cloud pak for Data drop down menu with Data Virtualization
  2. Add your data sources to Data Virtualization.
    Navigate to the Data sources page and then select Add connection to add connections. Data Virtualization supports dozens of relational and nonrelational data sources. Screenshot of Data sources page
    screenshot of New Connections page feature IBM, third-party and user-defined connection options
  3. Virtualize the tables from the data source.
    In the Virtualize page, select the tables that you want to virtualize and then select Add to cart > View cart to virtualize the tables.
    Screenshot of virtualize page with objects selected
    Screenshot of Review cart and virtualize tables page
  4. Join the tables to create a unified view.
    In the Virtualized data page, select the tables that you want to join and then select Join to join the objects.
    screenshot of Virtualized data page with objects selected to joinScreenshot of Join virtual objects page with objects being joined
  5. Query the virtual objects.
    Navigate to the Run SQL page to query your virtual objects using the built in SQL editor.
    Screenshot of Run SQL page
  6. Consume the data using other Cloud Pak for Data services in the data fabric.
    Consume virtual tables in projects, dashboards, data catalogs, and other applications. For more information, see Dashboard services.

Watch the following video for an overview of Data Virtualization.

This video provides a visual method as an alternative to the written documentation.

Learn more

For more information on supported data sources, see Supported data sources in Data Virtualization.

For more information on known issues and limitations, see Limitations and known issues in Data Virtualization.