You can push many data preparation and mining operations directly in your database to
improve performance.
One of the most powerful capabilities of SPSS Modeler is the ability
to perform many data preparation and mining operations directly in the database. By generating SQL
code that can be pushed back to the database for execution, many operations, such as sampling,
sorting, deriving new fields, and certain types of graphing, can be performed in the database rather
than on the client or server computer. When you're working with large datasets, these
pushbacks can dramatically enhance performance in several ways:
By reducing the size of the result set to be transferred from the DBMS to watsonx.ai. When large result sets are read through an ODBC driver, network I/O or
driver inefficiencies may result. For this reason, the operations that benefit most from SQL
optimization are row and column selection and aggregation (Select, Sample, Aggregate nodes), which
typically reduce the size of the dataset to be transferred. Data can also be cached to a temporary
table in the database at critical points in the flow (after a Merge or Select node, for example) to
further improve performance.
By making use of the performance and scalability of the database. Efficiency is increased
because a DBMS can often take advantage of parallel processing, more powerful hardware, more
sophisticated management of disk storage, and the presence of indexes.
Given these advantages, watsonx.ai is designed to maximize the amount of SQL
generated by each SPSS Modeler flow so that only those operations that
can't be compiled to SQL are executed by watsonx.ai. Because of limitations in what
can be expressed in standard SQL (SQL-92), however, certain operations may not be supported.
When running a flow, nodes that push back to your database are highlighted with a small
SQL icon beside the node. When you start making edits to a flow after running
it, the icons will be removed until the next time you run the flow.Figure 1. SQL pushback indicator
If you want to see which nodes will push back before running a flow, click SQL
preview. This enables you to modify the flow before you run it to improve performance by
moving the non-pushback operations as far downstream as possible, for example.
If a node can't be pushed back, all subsequent nodes in the flow won't be pushed back either
(pushback stops at that node). This may impact how you want to organize the order of nodes in your
flow.
Notes: Keep the following information in mind regarding SQL:
Because of minor differences in SQL implementation, flows that run in a database may return
slightly different results when executed in watsonx.ai. These differences may also vary
depending on the database vendor, for similar reasons. For example, depending on the database
configuration for case sensitivity in string comparison and string collation, SPSS Modeler flows that run using SQL pushback may produce different results from
those that run without SQL pushback. Contact your database administrator for advice on configuring
your database. To maximize compatibility with watsonx.ai, database string
comparisons should be case sensitive.
When using watsonx.ai to generate SQL, it's possible the result
using SQL pushback is not consistent on some platforms (Linux, for example). This is because
floating point is handled differently on different platforms.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.