COLLECT_STATISTICS stored procedure in Watson Query

Gathers statistics for virtualized tables in Watson Query. The Watson Query optimizer uses these statistics to determine the optimal access plans for processing your queries efficiently. The schema is DVSYS.

With more table statistics, the optimizer can make better decisions to provide the best possible access plans. When you run COLLECT_STATISTICS on a table, subsequent queries against that table usually run significantly faster. For more information, see Collecting statistics in Watson Query.

Important:
  • Use the ANALYZE command (instead of the COLLECT_STATISTICS procedure) to collect statistics on virtualized tables over object storage. For more information, see ANALYZE command. To determine which virtualized tables are created over object storage, you can filter virtualized objects by type. Select Object Storage, Table, or View in the Filter menu in Data > Data virtualization > Virtualized data.
  • Use the SYSPROC.NNSTAT procedure (instead of the COLLECT_STATISTICS procedure) to collect statistics on virtualized tables over flat files that are not in object storage. For more information, see SYSPROC.NNSTAT stored procedure.

Input parameters

virtschema
The type of this required parameter is VARCHAR(128). Specifies the schema name of the virtualized table.
virtname
The type of this required parameter is VARCHAR(128). Specifies the name of the virtualized table.
virtcolumns
The type of this optional parameter is VARCHAR(32672). Specifies a comma-separated list of names of the columns that statistics are to be collected for. A null value specifies that statistics are to be collected for all columns. An empty string specifies that no column statistics are to be collected. In this case, only table cardinality is collected. If a column name includes any special characters, the name must be enclosed by double quotation marks.
collection_type
The type of this required parameter is SMALLINT. Specifies the statistics collection method. Valid values are 1 (remote-catalog method) and 2 (remote-query method):
remote-catalog
This type of statistics collection is only supported for virtualized tables in remote data sources that support a local method of statistics collection. Statistics that are stored in the catalog tables at the remote data source are retrieved and then stored in the Watson Query statistics catalog. It is critical to ensure that accurate statistics are available in the remote data source. The remote-catalog type of statistics collection is not supported for grouped tables.
remote-query
This type of statistics collection uses SQL queries against the virtualized table to compute the statistics. This type can be resource-intensive and take a long time to complete if the virtualized table has many rows or statistics are gathered for many columns.
options
The type of this optional parameter is VARCHAR(32672). Specifies a comma-delimited list of extra parameters. This parameter is reserved for future use.

Output parameters

diagnostics
The type of this parameter is VARCHAR(32672). Represents diagnostic output if a failure occurs and a summary of the collected statistics with abbreviated results.

Usage notes

  • Run the COLLECT_STATISTICS procedure either directly or through the Watson Query web client whenever you make significant changes to the data in the remote data source.
  • If a column name includes any special characters, the name must be enclosed by double quotation marks.
  • If the remote data source supports tools for gathering local statistics, ensure that local statistics are gathered and that the remote-catalog method is used for collecting statistics on the virtualized table.
  • If the remote data source does not support tools for gathering local statistics, the remote-query method is the only option available.
  • Only the NUMNULLS statistic is collected for LOB-type columns.

Examples

The following examples use virtualized tables over tables that are part of the SAMPLE database in IBM® Db2®. For more information, see The SAMPLE database.

  • Use the remote-catalog method to collect statistics for all the columns in a table.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', null, 1, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : Collected statistics for table "SAMPLE  "."DEPARTMENT":
    Table Cardinality = 14
    Column "LOCATION" [CHAR(16)]:  colcard=1, numnulls=14, highkey="", lowkey=""
    Column "ADMRDEPT" [CHAR(3)]:  colcard=3, numnulls=0, highkey="E01", lowkey="A00"
    Column "DEPTNAME" [VARCHAR(36)]:  colcard=14, numnulls=0, highkey="SPIFFY COMPUTER SERVICE DIV.", lowkey="BRANCH OFFICE F2"
    Column "MGRNO" [CHAR(6)]:  colcard=9, numnulls=6, highkey="000100", lowkey="000020"
    Column "DEPTNO" [CHAR(3)]:  colcard=14, numnulls=0, highkey="I22", lowkey="B01"
    
    Return Status = 0
  • Use the remote-query method to collect statistics for all the columns in a table.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', null, 2, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : Collected statistics for table "SAMPLE  "."DEPARTMENT":
    Table Cardinality = 14
    Column "LOCATION" [CHAR(16)]:  colcard=1, numnulls=14, highkey="", lowkey=""
    Column "ADMRDEPT" [CHAR(3)]:  colcard=3, numnulls=0, highkey="E01", lowkey="A00"
    Column "DEPTNAME" [VARCHAR(36)]:  colcard=14, numnulls=0, highkey="SUPPORT SERVICES", lowkey="ADMINISTRATION SYSTEMS"
    Column "MGRNO" [CHAR(6)]:  colcard=9, numnulls=6, highkey="000100", lowkey="000010"
    Column "DEPTNO" [CHAR(3)]:  colcard=14, numnulls=0, highkey="J22", lowkey="A00"
    
    Return Status = 0
  • Use the remote-query method to collect statistics for some of the columns in a table.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', 'DEPTNO,DEPTNAME,LOCATION', 2, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : Collected statistics for table "SAMPLE  "."DEPARTMENT":
    Table Cardinality = 14
    Column "LOCATION" [CHAR(16)]:  colcard=1, numnulls=14, highkey="", lowkey=""
    Column "DEPTNAME" [VARCHAR(36)]:  colcard=14, numnulls=0, highkey="SUPPORT SERVICES", lowkey="ADMINISTRATION SYSTEMS"
    Column "DEPTNO" [CHAR(3)]:  colcard=14, numnulls=0, highkey="J22", lowkey="A00"
    
    Return Status = 0
  • Use the remote-query method to collect only table statistics.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', '', 2, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : Collected statistics for table "SAMPLE  "."DEPARTMENT":
    Table Cardinality = 14
    
    Return Status = 0
  • Use the remote-query method to try to collect statistics for a column that is not defined on the table.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', 'DEPTNO,FIRSTNME', 2, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : ERROR: VALIDATE COLUMN FILTER LIST -- Invalid column "FIRSTNME" in virtColumns
    
    Return Status = 0
  • Use the remote-catalog method to collect statistics when the remote catalog has no statistics for the local table.
    call dvsys.collect_statistics('SAMPLE', 'DEPARTMENT', null, 1, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : WARNING: No statistics found in remote catalog for table "SAMPLE  "."DEPARTMENT"
    
    Return Status = 0
  • Use the remote-catalog method to collect statistics when columns with special characters are specified.
    call dvsys.collect_statistics('SAMPLE', 'SpecialChars', '"Col,1","Col""2"', 2, null, ?)
    Value of output parameters
    --------------------------
    Parameter Name  : DIAGS
    Parameter Value : Collected statistics for table "SAMPLE  "."SpecialChars":
    Table Cardinality = 4
    Column "Col,1" [INTEGER]:  colcard=3, numnulls=0, highkey="3", lowkey="1"
    Column "Col"2" [INTEGER]:  colcard=3, numnulls=1, highkey="2", lowkey="1"
    
    Return Status = 0