Setting properties for flows

You can specify a number of properties to apply to the current flow.

To set flow properties, click the Flow Properties icon:
Flow properties icon

The following properties are available.

Options

General
  • Limit members for nominal fields. Select this option and specify a maximum number of members for nominal (set) fields after which the data type of the field becomes Typeless. This option is useful when working with large nominal fields. But note that when the measurement level of a field is set to Typeless, its role is automatically set to None. This means that the fields aren't available for modeling.
  • Refresh source nodes on execution. Select this option to automatically refresh all source (import) nodes when running the current flow. This action is analogous to clicking the Refresh button in an import node's properties, except that this option automatically refreshes all import nodes (except User Input nodes) for the current flow.
Date/Time
  • Import date/time as. Select whether to use date/time storage for date/time fields or whether to import them as string variables.
  • Date format. Select a date format to use for date storage fields or when strings are interpreted as dates by CLEM date functions.
  • Time format. Select a time format to use for time storage fields or when strings are interpreted as times by CLEM time functions.
  • Rollover days/mins. For time formats, select whether negative time differences are interpreted as referring to the previous day or hour.
  • Date baseline (1st Jan). Select the baseline years (always 1 January) to be used by CLEM date functions that work with a single date.
  • 2-digit dates start from. Specify the cutoff year to add century digits for years that are denoted with only 2 digits. For example, specifying 1930 as the cutoff year assumes that 05/11/02 is in the year 2002. The same setting will use the 20th century for dates after 30; thus 05/11/73 is assumed to be in 1973.
  • Time zone. Select how the time zone is chosen for use with the datetime_now CLEM expression.
    • If you select Server, the time zone is used from where the SPSS Modeler run-time is running (in some cases this may be the same as the Client option). Or if your flow uses data from a database and the supported database uses SQL pushback, the datetime_now expression will use the time of the database.
    • If you select Client, the time zone is used from the machine where SPSS Modeler is installed.
    • Alternatively, you can select any of the Coordinated Universal Time values for the time zone.
Number Formats
For standard, scientific, and currency display formats, specify the number of decimal places to use when displaying real numbers.
Optimization
You can use these settings to optimize flow performance.
  • Enable flow rewriting. Select this option to enable flow rewriting. Flow rewriting reorders the nodes in a flow behind the scenes for more efficient operation, without altering flow semantics.
  • Optimize CLEM expressions. This option enables the optimizer to search for CLEM expressions that can be preprocessed before the flow runs, to increase the processing speed. As a simple example, if you have an expression such as log(salary), the optimizer will calculate the actual salary value and pass that on for processing. This can be used to improve both SQL pushback and SPSS Modeler performance.
  • Optimize syntax execution. This method of flow rewriting increases the efficiency of operations that incorporate more than one node containing SPSS Statistics syntax. Optimization is achieved by combining the syntax commands into a single operation, instead of running each as a separate operation.
  • Optimize other execution. This method of flow rewriting increases the efficiency of operations that can't be delegated to the database. Optimization is achieved by reducing the amount of data in the flow as early as possible. While maintaining data integrity, the flow is rewritten to push operations closer to the data source, thus reducing data downstream for costly operations, such as joins.
  • Enable parallel processing. When running on a computer with multiple processors, this option allows the system to balance the load across those processors, which may result in faster performance. Use of multiple nodes or use of the following individual nodes may benefit from parallel processing: C5.0, Merge (by key), Sort, Bin (rank and tile methods), and Aggregate (using one or more key fields).
  • Database caching (SQL only). For flows that generate SQL to be run in the database, data can be cached mid flow to a temporary table in the database rather than to the file system. When combined with SQL optimization, this may result in significant gains in performance. For example, the output from a flow that merges multiple tables to create a data mining view may be cached and reused as needed. With database caching enabled, simply right-click any nonterminal node to cache data at that point, and the cache is automatically created directly in the database the next time the flow runs. This allows SQL to be generated for downstream nodes, further improving performance. Alternatively, this option can be disabled if needed, such as when policies or permissions preclude data being written to the database. If database caching or SQL optimization is not enabled, the cache will be written to the file system instead.
  • Use relaxed conversion (SQL only). This option enables the conversion of data from either strings to numbers, or numbers to strings, if stored in a suitable format. For example, if the data is kept in the database as a string, but actually contains a meaningful number, the data can be converted for use when the pushback occurs.
Logging
  • Display SQL in the messages log at run time. Specifies whether SQL generated while running the flow is passed to the messages log.
  • Display SQL generation in the message log during preparation. During flow preview, specifies whether a preview of the SQL that would be generated is passed to the messages log.
  • SQL format Specifies whether any SQL that's displayed in the log should contain native SQL functions or standard ODBC functions of the form {fn FUNC(…)}, as generated by SPSS Modeler. The former relies on ODBC driver functionality that may not be implemented.
  • Reformat SQL for improved readability. Specifies whether SQL displayed in the log should be formatted for readability.

Scripting

Flow scripts are stored as a flow property here and are therefore saved and loaded with a specific flow. For example, you can write a flow script that automates the process of training and applying a model nugget. You can also specify that whenever a particular flow runs, the script should run instead of the flow's canvas content.

Script language
Select whether to use Python scripting or the legacy SPSS Modeler scripting that was specific to old versions of SPSS Modeler.
Script
Enter a script to customize operations within a flow. The script is saved with the flow. You can use a script to specify a particular execution order for the terminal nodes within a flow.
When the flow is run
Specify whether to run all terminal nodes or to run the script you provided whenever the flow runs.

Parameters

You can define parameters for use in CLEM expressions and in scripting. They are, in effect, user-defined variables that are saved and persisted with the current flow, session, or SuperNode, and can be accessed from the user interface as well as through scripting. If you save a flow, for example, any parameters set for that flow are also saved. (This distinguishes them from local script variables, which can be used only in the script in which they are declared.) Parameters are often used in scripting to control the behavior of the script, by providing information about fields and values that don't need to be hard coded in the script.

If you set a parameter here in the flow properties, it's available to all nodes in the flow. Click Add Value and enter the following information.

Name
Parameter names are listed here. For example, to create a parameter for the minimum temperature, you could type minvalue. Do not include the $P- prefix that denotes a parameter in CLEM expressions. This name is how the parameter is referenced in expressions.
Label
Lists a descriptive name for each parameter created.
Storage
Select a storage type from the list. Storage indicates how the data values are stored in the parameter. For example, when working with values containing leading zeros that you want to preserve (such as 008), you should select String as the storage type. Otherwise, the zeros will be stripped from the value. Available storage types are string, integer, real, time, date, and timestamp. For date parameters, note that you must specify values using ISO standard notation as shown in the next paragraph.
Value
Lists the current value for each parameter. Adjust the parameter as required. Note that for date parameters, values must be specified in ISO standard notation (that is, YYYY-MM-DD). Dates specified in other formats aren't accepted.
Measure
Select the measurement level, which is used to describe characteristics of the parameter.

Messages

In the Messages tab of the flow properties, you can easily view messages regarding flow operations, such as running, optimization, and time elapsed for model building and evaluation. Error messages are also reported in this table.