Inaccurate query results for Hive data sources in Data Virtualization

Last updated: Mar 17, 2025
Inaccurate query results for Hive data sources in Data Virtualization

If you run queries against Hive data sources Version 3.0 or higher, these queries might return results of 0 rows and tables.

Symptoms

You add a connection to a Hive data sources Version 3.0 or higher and the connection is added successfully. You click Virtualize on the service menu; however, you get unexpected zero tables from the Hive connection.

Resolving the problem

To solve this issue, complete one of the following options:
Set the server parameter on the Hive data source
  1. Go to the hive_site.xml file on the server of the Hive data source.
  2. Set the following parameter in the hive_site.xml file.
    metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
  3. Restart the Hive data source.
Add the connection parameter for the Hive data source
  1. In the Data Virtualization web client, navigate to the Data sources page, then select your Hive connection.
  2. Select Edit connection from the overflow menu (...), then select Connection details > Additional properties.
  3. Add the following parameter in the text box.
    CatalogMode=query
Note: Remote tables with column names that use the forward slash "/" are inaccessible if you set CatalogMode=query due to an issue with the data source. If you have column names that use this character, then use this alternative setting:
CatalogMode=native

CatalogMode=native produces the best performance at the expense of less-accurate catalog information. CatalogMode=query produces the most accurate catalog information, but at the expense of slower performance.