Virtualizing data
Use the Data Virtualization service to easily join data from different sources in one unified view, without manual changes, data movement, or replication.
Overview
With Data Virtualization, you can access physical data from multiple sources through a single semantic virtual layer. This virtual layer means that the data can be accessed, manipulated, and analyzed without the need to know its physical format or location, and without having to move or copy it.
Data Virtualization is part of the data fabric.
Prerequisites
If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see IBM Knowledge Catalog on Cloud Pak for Data.
- Required service
- Data Virtualization.
Before you can virtualize your data, you must create and deploy a service instance of Data Virtualization in Cloud Pak for Data as a Service. For more information, see Provision a service instance for Data Virtualization.
After you provision your service instance, you can set up your Data Virtualization instance. For more information, see Getting started with Cloud Pak for Data as a Service and Quick start: Virtualize data.
- Related service
- IBM
Knowledge Catalog.
If you want to publish your virtual data to a governed catalog, you must install IBM Knowledge Catalog. For more information, see Data governance.
- Data formats
- Data Virtualization works with connections to many types of data sources and formats. For more information, see Connecting to data sources in Data Virtualization.
- Data size
- Each data source defines its own data size limits. For more information, see Supported data sources in Data Virtualization.
- Credentials
- Data Virtualization uses your IBM Cloud credentials to connect to the service. You must have certain Data Virtualization roles to perform certain tasks. For more information, see Connecting and authenticating to the Data Virtualization service.
Getting started
-
- Open the Data Virtualization service.
- In the Cloud Pak for Data navigation menu, select .
-
- Add your data sources to Data Virtualization.
- Navigate to the Data sources page and then select Add
connection to add connections. Data Virtualization supports dozens of relational and
nonrelational data sources.
-
- Virtualize the tables from the data source.
- In the Virtualize page, select the tables that you want to virtualize and then select to virtualize the tables.
-
- Join the tables to create a unified view.
- In the Virtualized data page, select the tables that you want to join and then select Join to join the objects.
-
- Query the virtual objects.
- Navigate to the Run SQL page to query your virtual objects using the built in SQL editor.
-
- Consume the data using other Cloud Pak for Data services in the data fabric.
- Consume virtual tables in projects, dashboards, data catalogs, and other applications. For more information, see Dashboard services.
Watch the following video for an overview of Data Virtualization.
This video provides a visual method as an alternative to the written documentation.
Learn more
For more information on supported data sources, see Supported data sources in Data Virtualization.
For more information on known issues and limitations, see Limitations and known issues in Data Virtualization.