The initial fragments of a flow leading from the data import nodes are the main targets
for SQL generation. When a node is encountered that can't be compiled to SQL, the data is extracted
from the database and subsequent processing is performed.
During flow preparation and prior to running, the SQL generation process happens as follows:
The software reorders flows to move downstream nodes into the “SQL zone” where it can be proven
safe to do so.
Working from the import nodes toward the terminal nodes, SQL expressions are constructed
incrementally. This phase stops when a node is encountered that can't be converted to SQL or when
the terminal node (for example, a Table node or a Graph node) is converted to SQL. At the end of
this phase, each node is labeled with an SQL statement if the node and its predecessors have an SQL
equivalent.
Working from the nodes with the most complicated SQL equivalents back toward the import nodes,
the SQL is checked for validity. The SQL that was successfully validated is chosen for execution.
Nodes for which all operations have generated SQL are highlighted with a
SQL icon next to the node on the flow canvas. Based on the results, you may
want to further reorganize your flow where appropriate to take full advantage of database execution.
Where do improvements occur?
Copy link to section
SQL pushback improves performance in a number of data operations:
Joins (merge by key). Join operations can increase optimization within
databases.
Aggregation. The Aggregate, Distribution, and Web nodes all use
aggregation to produce their results. Summarized data uses considerably less bandwidth than the
original data.
Selection. Choosing records based on certain criteria reduces the
quantity of records.
Sorting. Sorting records is a resource-intensive activity that is
performed more efficiently in a database.
Field derivation. New fields are generated more efficiently in a
database.
Field projection. The software extracts only fields that are required for
subsequent processing from the database, which minimizes bandwidth and memory requirements. The same
is also true for superfluous fields in flat files: although the software must read the superfluous
fields, it doesn't allocate any storage for them.
Scoring. SQL can be generated from decision trees, rulesets, linear
regression, and factor-generated models.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.