To ensure that remote connectors correctly decode file data, you must set the
character encoding scheme manually. By setting the character encoding scheme, you configure the
remote connector to apply specific decoding to read data files.
About this task
Cloud Pak for Data automatically detects the encoding
scheme of flat data files, such as CSV and TSV files. However, you must set the encoding scheme
manually for flat data files to avoid decoding issues.
These instructions use files with data encoded in Shift-JS (Japanese) as an example. To get a
full list of data encodings, see Supported encodings.
Note:
- You can follow these steps while the remote connector is running. However, to apply new encoding
schemes to an existing virtual table, you must delete the virtual table and virtualize it again.
- The properties files are located under a special folder in the remote connector installation
directory, separate from your data files. The Watson Query remote connector remains
self-contained with minimal disruption to your own environment, which also follows the
containerization principles and benefits that are provided by the Docker installation of remote connectors.
Procedure
To ensure that remote connectors correctly decode data in files, choose one of the
following methods:
- Set the global default encoding scheme for all data files on this host.
- Find the
Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties
file.
Replace Connector_install_directory with the directory where you
installed the remote connector.
- Edit the FileImportDefaults.properties file to add the following
property:
DataCodeset=windows-932
By setting this property, you configure the remote connector to apply Shift-JS decoding to read
data files.
- Override encoding settings from the FileImportDefaults.properties
file for all data files in a specific hierarchy of folders under a path on this
host.
These instructions use the hierarchy of folders under the
/path/to/hierarchy path as an example.
- Find the
Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties
file.
Replace Connector_install_directory with the directory where you
installed the remote connector.
- Copy the FileImportDefaults.properties file to the new
location:
Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy/FileImportDefaults.properties
- Edit the FileImportDefaults.properties file in the new location
to add the following property:
DataCodeset=windows-932
By setting this property, you configure the remote connector to apply Shift-JS decoding to read
all files under the hierarchy of folders in the
/path/to/hierarchy path.
Note: In cases where you have several properties files at different depths in the
hierarchy of folders under
Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy,
the one having the closest matching subpath to the actual data file path takes
precedence.
- Override encoding settings for all files with a specific name in a specific hierarchy of
folders under a path on this host.
These instructions use the hierarchy of folders under
the /path/to/hierarchy path, and the datafile.csv file
name as examples.
- Find the
Connector_install_directory/sysroot/data/FileImportControls/FileImportDefaults.properties
file.
Replace Connector_install_directory with the directory where you
installed the remote connector.
- Copy the FileImportDefaults.properties file to the new location.
Change the file name to datafile.csv:
Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy/datafile.csv.properties
- Edit the datafile.csv.properties file in the new location to add
the following property:
DataCodeset=windows-932
By setting this property, you configure the remote connector to apply Shift-JS decoding to read
all files named
datafile.csv under the hierarchy of folders in the
/path/to/hierarchy path.
Note: In cases where you have several properties files at different depths in the
hierarchy of folders under
Connector_install_directory/sysroot/data/FileImportControls/path/to/hierarchy,
the one having the closest matching subpath to the actual data file path takes
precedence.