To get the best performance boost from SQL optimization, pay attention to the items in
this section.
Flow order. SQL generation may be halted when the function of the node has
no semantic equivalent in SQL because SPSS Modeler’s data-mining
functionality is richer than the traditional data-processing operations supported by standard SQL.
When this happens, SQL generation is also suppressed for any downstream nodes. Therefore, you may be
able to significantly improve performance by reordering nodes to put operations that halt SQL as far
downstream as possible. The SQL optimizer can do a certain amount of reordering
automatically, but further improvements may be
possible. A good candidate for this is the Select node, which can often be brought forward. See
Nodes supporting SQL pushback for more information.
CLEM expressions. If a flow can't be reordered, you may be able to change
node options or CLEM expressions or otherwise recast the way the operation is performed, so that it
no longer inhibits SQL generation. Derive, Select, and similar nodes can commonly be rendered into
SQL, provided that all of the CLEM expression operators have SQL equivalents. Most operators can be
rendered, but there are a number of operators that inhibit SQL generation (in particular, the
sequence functions [“@ functions”]). Sometimes generation is halted because the generated query has
become too complex for the database to handle. See CLEM expressions and operators supporting SQL pushback for more
information.
Multiple input nodes. Where a flow has multiple data import nodes, SQL
generation is applied to each import branch independently. If generation is halted on one branch, it
can continue on another. Where two branches merge (and both branches can be expressed in SQL up to
the merge), the merge itself can often be replaced with a database join, and generation can be
continued downstream.
Scoring models. In-database scoring is supported for some models by
rendering the generated model into SQL. However, some models generate extremely complex SQL
expressions that aren't always evaluated effectively within the database. For this reason, SQL
generation must be enabled separately for each generated model nugget. If you find that a model
nugget is inhibiting SQL generation, open the model nugget's settings and select Generate
SQL for this model (with some models, you may have additional options controlling
generation). Run tests to confirm that the option is beneficial for your application. See Nodes supporting SQL pushback for more information.
When testing modeling nodes to see if SQL generation for models works effectively, we recommend
first saving all flows from SPSS Modeler. Note that some database
systems may hang while trying to process the (potentially complex) generated SQL.
Database caching. If you are using a node cache to save data at critical
points in the flow (for example, following a Merge or Aggregate node), make sure that database
caching is enabled along with SQL optimization. This will allow data to be cached to a temporary
table in the database (rather than the file system) in most cases.
Vendor-specific SQL. Most of the generated SQL is standards-conforming
(SQL-92), but some nonstandard, vendor-specific features are exploited where practical. The degree
of SQL optimization can vary, depending on the database source.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.