Creating a virtualized grouped table from multiple data sources in Watson Query

Creating a virtualized table from multiple data sources in Watson Query

You can combine tables from multiple data sources into a single virtual table, which creates a unified definition that contains the columns and data from all participating data sources. These tables are referred to as grouped tables.

Segmentation is vertical (either a subset or superset of columns based on a selection of chosen columns). You can run queries against the resulting virtual table the same way that you would query any of the base tables.

Tables can be grouped into a single virtualized table if the tables' names as resolved by the driver are the same (an exact match, case-sensitive). The table grouping semantics considers columns with the same name (case-insensitive) to match. The data type promotion follows precedence rules.

You must ensure that the grouping of tables is logical, especially concerning the handling of data types of the result grouped table. In some cases, corresponding columns might have different data types. Rules are used to determine the result data types of the grouped table. For more information, see Rules for result data types.

To prevent unexpected results, you must be careful when you group tables into a single virtualized table. The following recommendations can be considered when you create a grouped table, although nothing is restricted.

Ensure that the columns with the same name from all the tables have compatible data types and compatible business usage. The domain of data in all tables must be compatible.
Group tables from the same type of data source to ensure consistent data semantics.
When you group tables from different types of data sources, ensure consistent data semantics.
Ensure that all tables in the group have most columns with the same names. Ensure that matching columns have compatible data types.
Do not include columns that are defined on complex types.

Note: Some data sources, such as Hive, use lowercase default for tables names. These tables cannot be grouped with tables from other data sources that have uppercase as the default unless delimited identifiers are used to ensure an exact match on table names.

About this task

These steps describe how to virtualize data without business terms. If you want to govern your virtual data, see Governing virtual data in Watson Query.

Procedure

On the navigation menu, click Data > Data virtualization.
The service menu opens to the Data sources page by default.
On the service menu, click Virtualization > Virtualize and click the Tables tab.

The list of connections appears in the default Explore view. Click a connection, schema, or table to select it and preview the contents.

The list of available tables in your connections appears in the List view. You can filter the listed tables by adding filters on the Data sources page. Additionally, you can search for tables by name, schema, column, or business term.

Watson Query can automatically suggest matched tables that can be combined into a single virtual table.
To combine tables into a single virtual object, click Settings and select Group tables with identical names in the settings menu.

The display shows the suggested grouping of tables, and you can fine-tune the participating source tables on the Review step.

The list of available tables includes tables in which the read permission is not granted (non-readable tables).
If you create a virtual table that includes a non-readable table, select one of the following options to make your virtual table queryable.
- Option 1 Click Edit grouped tables on the table action menu and remove the non-readable table from the group that you are virtualizing.
- Option 2 Ensure that the user who is assigned to the data source connection can access the non-readable table. This step can happen before or after the table is virtualized.
- Option 3 Create a data source connection in Watson Query to access the same data source with credentials for a separate user that does have read-access to the table. As a result, the list of additional table instances appears when you select the Group tables with identical names option. Before virtualizing tables, you must still manually remove non-readable tables discovered under the restricted user's connection (as described in option 1).
Select the table that you want to virtualize and click Add to cart.
Click View cart to view your selections. From this window, you can also edit the table and schema names or remove a selection from your cart.

Select the appropriate action to assign the virtualized table.

Assign to	When to use this option
Project	Select Project if you created the virtual table to use in a specific project. Then, choose the appropriate project. The table also appears in Virtualized data.
Virtualized data	Select Virtualized data if the table was not created to use in a specific project. This setting is the default if no projects exist.

Select Publish to catalog if you also want to publish to a selected catalog.
A list of available catalogs is shown in the drop-down menu. Each catalog is tagged as Governed or Not governed.
Note: You must have at least one catalog in IBM Knowledge Catalog.
You must have permission to publish to a catalog. An administrator can enable whether all virtual objects are published to a selected governed catalog, which prevents a user from publishing to a specified catalog.
Specify a schema in the Schema field.
You can also create a schema by following these steps.
- If you have the Watson Query Engineer or User role, leave the Schema field as default to create a schema with your user ID.
- If you have the Watson Query Manager role, leave the Schema field as default to create a schema with your user ID or enter the new schema name in the Schema field.
For more information, see Creating schemas for virtual objects.
Click Virtualize to complete the process.
When the status window appears, you can select to view your virtualized data or virtualize more data.
Click View virtualized data to see your newly created tables.

What to do next

You can collect statistics for your virtual object. For more information, see Collecting statistics in Watson Query.
On the Virtualized data page, you can publish your virtual object to the catalog. For more information, see Publishing virtual data to the catalog in Watson Query.
You can also create join multiple virtual tables to create a joined view. See Creating a join view from multiple tables.