0 / 0
SQL optimization (SPSS Modeler)

Nodes supporting SQL pushback

The tables in this section show nodes representing data-mining operations that support SQL pushback. If a node doesn't appear in these tables, it doesn't support SQL pushback.

Table 1. Record Operations nodes
Nodes supporting SQL generation Notes
Select Supports generation only if SQL generation for the select expression itself is supported. If any fields have nulls, SQL generation does not give the same results for discard as are given in native SPSS Modeler.
Sample Simple sampling supports SQL generation to varying degrees depending on the database.
Aggregate SQL generation support for aggregation depends on the data storage type.
RFM Aggregate Supports generation except if saving the date of the second or third most recent transactions, or if only including recent transactions. However, including recent transactions does work if the datetime_date(YEAR,MONTH,DAY) function is pushed back.
Sort  
Merge No SQL generated for merge by order.

Merge by key with full or partial outer join is only supported if the database/driver supports it. Non-matching input fields can be renamed by means of a Filter node, or the Filter settings of an import node.

Supports SQL generation for merge by condition.

For all types of merge, SQL_SP_EXISTS is not supported if inputs originate in different databases.

Append Supports generation if inputs are unsorted. SQL optimization is only possible when your inputs have the same number of columns.
Distinct A Distinct node with the (default) mode Create a composite record for each group selected doesn't support SQL optimization.
Table 2. SQL generation support in the Sample node for simple sampling
Mode Sample Max size Seed Db2 for z/OS Db2 for OS/400 Db2 for Win/UNIX Oracle SQL Server Teradata
Include First n/a   Y Y Y Y Y Y
  1-in-n off   Y Y Y Y   Y
    max   Y Y Y Y   Y
  Random % off off Y   Y Y   Y
      on Y   Y Y    
    max off Y   Y Y   Y
      on Y   Y Y    
Discard First off         Y    
    max         Y    
  1-in-n off   Y Y Y Y   Y
    max   Y Y Y Y   Y
  Random % off off Y   Y Y   Y
      on Y   Y Y    
    max off Y   Y Y   Y
      on Y   Y Y    
Table 3. SQL generation support in the Aggregate node
Storage Sum Mean Min Max SDev Median Count Variance Percentile
Integer Y Y Y Y Y Y* Y Y Y*
Real Y Y Y Y Y Y* Y Y Y*
Date     Y Y   Y* Y   Y*
Time     Y Y   Y* Y   Y*
Timestamp     Y Y   Y* Y   Y*
String     Y Y   Y* Y   Y*

* Median and Percentile are supported on Oracle.

Table 4. Field Operations nodes
Nodes supporting SQL generation Notes
Type Supports SQL generation if the Type node is instantiated and no ABORT or WARN type checking is specified.
Filter  
Derive Supports SQL generation if SQL generated for the derive expression is supported (see expressions later on this page).
Ensemble Supports SQL generation for Continuous targets. For other targets, supports generation only if the Highest confidence wins ensemble method is used.
Filler Supports SQL generation if the SQL generated for the derive expression is supported.
Anonymize Supports SQL generation for Continuous targets, and partial SQL generation for Nominal and Flag targets.
Reclassify  
Binning Supports SQL generation if the Tiles (equal count) binning method is used and the Read from Bin Values tab if available option is selected. Due to differences in the way that bin boundaries are calculated (this is caused by the nature of the distribution of data in bin fields), you might see differences in the binning output when comparing normal flow execution results and SQL pushback results. To avoid this, use the Record count tiling method, and either Add to next or Keep in current tiles to obtain the closest match between the two methods of flow execution.
RFM Analysis Supports SQL generation if the Read from Bin Values tab if available option is selected, but downstream nodes will not support it.
Partition Supports SQL generation to assign records to partitions.
Set To Flag  
Restructure  
Table 5. Graphs nodes
Nodes supporting SQL generation Notes
Distribution  
Web  
Evaluation  

For some models, SQL for the model nugget can be generated, pushing back the model scoring stage to the database. The main use of this feature is not to improve performance, but to allow flows containing these nuggets to have their full SQL pushed back. See Generating SQL from model nuggets for more information.

Table 6. Model nuggets
Model nuggets supporting SQL generation Notes
C&R Tree Supports SQL generation for the single tree option, but not for the boosting, bagging, or large dataset options.
QUEST  
CHAID  
C5.0  
Decision List  
Linear Supports SQL generation for the standard model option, but not for the boosting, bagging, or large dataset options.
Neural Net Supports SQL generation for the standard model option (Multilayer Perceptron only), but not for the boosting, bagging, or large dataset options.
PCA/Factor  
Logistic Supports SQL generation for Multinomial procedure but not Binomial. For Multinomial, generation isn't supported when confidences are selected, unless the target type is Flag.
Generated Rulesets  
Auto Classifier If a User Defined Function (UDF) scoring adapter is enabled, these nuggets support SQL pushback. Also, if either SQL generation for Continuous targets, or the Highest confidence wins ensemble method are used, these nuggets support further pushback downstream.
Auto Numeric
Table 7. Outputs nodes
Nodes supporting SQL generation Notes
Table Supports generation if SQL generation is supported for highlight expression.
Matrix Supports generation except if All numerics is selected for the Fields option.
Analysis Supports generation, depending on the options selected.
Transform  
Statistics Supports generation if the Correlate option isn't used.
Report  
Set Globals  
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more