0 / 0
Setting the character-encoding scheme in Data Virtualization
Last updated: Nov 26, 2024
Setting the character-encoding scheme in Data Virtualization for IBM Cloud Pak for Data

To ensure that remote connectors correctly decode file data, you must set the character encoding scheme manually. By setting the character encoding scheme, you configure the remote connector to apply specific decoding to read data files.

About this task

Cloud Pak for Data automatically detects the encoding scheme of flat data files, such as CSV and TSV files. However, you must set the encoding scheme manually for flat data files to avoid decoding issues.

These instructions use files with data encoded in Shift-JS (Japanese) as an example. To get a full list of data encodings, see Supported encodings.

Note:
  • You can follow these steps while the remote connector is running. However, to apply new encoding schemes to an existing virtual table, you must delete the virtual table and virtualize it again.
  • The properties files are located under a special folder in the remote connector installation directory, separate from your data files. The Data Virtualization remote connector remains self-contained with minimal disruption to your own environment, which also follows the containerization principles and benefits that are provided by the Docker installation of remote connectors.

Procedure

To ensure that remote connectors correctly decode data in files, choose one of the following methods:

  • Set the global default encoding scheme for all data files on this host.
    1. Find the Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties file.
      Replace Connector_install_directory with the directory where you installed the remote connector.
    2. Edit the FileImportDefaults.properties file to add the following property:
      DataCodeset=windows-932

      By setting this property, you configure the remote connector to apply Shift-JS decoding to read data files.

  • Override encoding settings from the FileImportDefaults.properties file for all data files in a specific hierarchy of folders under a path on this host.
    These instructions use the hierarchy of folders under the /path/to/hierarchy path as an example.
    1. Find the Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties file.
      Replace Connector_install_directory with the directory where you installed the remote connector.
    2. Copy the FileImportDefaults.properties file to the new location:
      Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy/FileImportDefaults.properties
    3. Edit the FileImportDefaults.properties file in the new location to add the following property:
      DataCodeset=windows-932
      By setting this property, you configure the remote connector to apply Shift-JS decoding to read all files under the hierarchy of folders in the /path/to/hierarchy path.
      Note: In cases where you have several properties files at different depths in the hierarchy of folders under Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy, the one having the closest matching subpath to the actual data file path takes precedence.
  • Override encoding settings for all files with a specific name in a specific hierarchy of folders under a path on this host.
    These instructions use the hierarchy of folders under the /path/to/hierarchy path, and the datafile.csv file name as examples.
    1. Find the Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties file.
      Replace Connector_install_directory with the directory where you installed the remote connector.
    2. Copy the FileImportDefaults.properties file to the new location. Change the file name to datafile.csv:
      Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy/datafile.csv.properties
    3. Edit the datafile.csv.properties file in the new location to add the following property:
      DataCodeset=windows-932
      By setting this property, you configure the remote connector to apply Shift-JS decoding to read all files named datafile.csv under the hierarchy of folders in the /path/to/hierarchy path.
      Note: In cases where you have several properties files at different depths in the hierarchy of folders under Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy, the one having the closest matching subpath to the actual data file path takes precedence.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more