Queries on virtualized flat files fail with incorrect results in Watson Query
When you virtualize flat files that are accessed by using a remote agent, some queries have incorrect results in cases where column type conversions are made for columns that have special characters.
Symptoms
The Excel source wrapper in Watson Query was upgraded in Cloud Pak for Data 4.0.2 to allow access to spreadsheets of unlimited size. In previous releases, Watson Query might fail to access spreadsheets that contain numerous data, for example greater than 250,000 cells, or have a large file size, for example greater than 3 MB, which easily happens when a workbook contains macros. If you encounter any issues (for example, unexpected types or values), you can toggle between new and old source wrappers with the following SQL API calls.
- To switch to the previous Excel source wrapper named GExcel, use the following
command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.GExcel', '', ?, ?)
- To switch to the current Excel source wrapper named QExcel, use the following command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.QExcel', '', ?, ?)
For example, currency values that start with a dollar symbol are inferred as type VARCHAR in the original preview. However, if that column is retyped to a numeric type such as DECIMAL, when you virtualize, queries against the virtual table have erroneous null cells in that column.
db2 => SELECT * FROM DV_TEST."TBL_XLS_TESTFORMULA2" WHERE IDNUM BETWEEN 980 AND 990
IDNUM NAME GROSS DED1 DED2 TDED NET
----------- ------------------------------ ---------- -------- -------- -------- ----------
980 Ahmad Kynett - - - - -
981 Sherlock Spalton - - - - -
982 Ira Gladtbach - - - - -
983 Andrej Dumphy - - - - -
984 Jeremie Fullicks - - - - -
985 Costa Shambrook - - - - -
986 Jacob Darell - - - - -
987 Kevon Heineking - - - - -
988 Sib MacTrusty - - - - -
989 Angelique Wace - - - - -
990 Lothario Kuschel - - - - -
11 record(s) selected.
db2 => SELECT * FROM DV_LOCAL."TBL_XLS_TESTFORMULA2" WHERE IDNUM BETWEEN 980 AND 990
IDNUM NAME GROSS DED1 DED2 TDED NET
----------- ------------------------------ ---------- -------- -------- -------- ----------
980 Ahmad Kynett 4196.77 568.05 349.44 917.50 3279.27
981 Sherlock Spalton 4904.10 681.37 308.92 990.29 3913.81
982 Ira Gladtbach 3189.25 553.35 387.08 940.43 2248.81
983 Andrej Dumphy 4152.29 690.10 499.46 1189.56 2962.73
984 Jeremie Fullicks 2132.98 516.84 319.20 836.05 1296.93
985 Costa Shambrook 3058.66 525.03 363.98 889.01 2169.64
986 Jacob Darell 4247.33 687.88 376.42 1064.30 3183.02
987 Kevon Heineking 3437.73 588.29 464.82 1053.10 2384.62
988 Sib MacTrusty 2231.61 586.28 356.07 942.34 1289.26
989 Angelique Wace 3973.60 683.93 318.11 1002.04 2971.56
990 Lothario Kuschel 2785.71 512.24 314.47 826.71 1959.00
11 record(s) selected.
Resolving the problem
To work around this issue, complete the following steps.
- Run the following SQL query to find the agent NODE_NAME and
PROPERTY that need to be modified, replacing <your file
name> with the name of the file that was
virtualized.
select NODE_NAME, REGEXP_REPLACE( id, '_ARGS$', '_VTI' ) PROPERTY from dvsys.listconfig where id like 'VTF_%_ARGS' and def like '%<your file name>%'
- Run the following SQL query to use the
GExcel
implementation for accessing this spreadsheet so that types can be converted correctly. Replace<NODE_NAME>
and<PROPERTY>
with the values that you found in step 1.CALL DVSYS.SETCONFIGPROPERTY('<PROPERTY>', 'com.ibm.db2j.GExcel', '<NODE_NAME>', ?, ?)
- To revert to the new implementation, rerun the same command from step 2, replacing
GExcel
withQExcel
.