Setting properties for SPSS Modeler flows | IBM Data Product Exchange

Setting properties for flows

Last updated: Oct 09, 2024

Setting properties for SPSS Modeler flows

You can specify properties to apply to the current flow.

To set flow properties, click the Flow Properties icon:

The following properties are available.

Options

General

Maximum number of rows to show in Data Preview. Specify the number of rows to be shown when a preview of the data is requested for a node.
Limit members for nominal fields. Select this option and specify a maximum number of members for nominal (set) fields after which the data type of the field becomes Typeless. This option is useful when working with large nominal fields. But when the measurement level of a field is set to Typeless, its role is automatically set to None. This means that the fields aren't available for modeling.
Refresh source nodes on execution. Select this option to automatically refresh all source (import) nodes when running the current flow. This action is analogous to clicking the Refresh button in an import node's properties, except that this option automatically refreshes all import nodes (except User Input nodes) for the current flow.

Date/Time

Import date/time as. Select whether to use date/time storage for date/time fields or whether to import them as string variables.
Date format. Select a date format to use for date storage fields or when strings are interpreted as dates by CLEM date functions.
Time format. Select a time format to use for time storage fields or when strings are interpreted as times by CLEM time functions.
Rollover days/mins. For time formats, select whether negative time differences are interpreted as referring to the previous day or hour.
Date baseline (1st Jan). Select the baseline years (always 1 January) to be used by CLEM date functions that work with a single date.
2-digit dates start from. Specify the cutoff year to add century digits for years that are denoted with only 2 digits. For example, specifying 1930 as the cutoff year assumes that 05/11/02 is in the year 2002. The same setting will use the 20th century for dates after 30; thus 05/11/73 is assumed to be in 1973.
Time zone. Select how the time zone is chosen for use with the datetime_now CLEM expression.
- If you select Server, the time zone is used from where the SPSS Modeler run-time is running (in some cases this may be the same as the Client option). Or if your flow uses data from a database and the supported database uses SQL pushback, the datetime_now expression will use the time of the database.
- If you select Client, the time zone is used from the machine where SPSS Modeler is installed.
- Alternatively, you can select any of the Coordinated Universal Time values for the time zone.

Number Formats: For standard, scientific, and currency display formats, specify the number of decimal places to use when displaying real numbers.

Optimization

You can use these settings to optimize flow performance.

Enable flow rewriting. Select this option to enable flow rewriting. Flow rewriting reorders the nodes in a flow behind the scenes for more efficient operation, without altering flow semantics.
Optimize CLEM expressions. This option enables the optimizer to search for CLEM expressions that can be preprocessed before the flow runs, to increase the processing speed. As a simple example, if you have an expression such as log(salary), the optimizer will calculate the actual salary value and pass that on for processing. This can be used to improve both SQL pushback and SPSS Modeler performance.
Optimize syntax execution. This method of flow rewriting increases the efficiency of operations that incorporate more than one node containing SPSS Statistics syntax. Optimization is achieved by combining the syntax commands into a single operation, instead of running each as a separate operation.
Optimize other execution. This method of flow rewriting increases the efficiency of operations that can't be delegated to the database. Optimization is achieved by reducing the amount of data in the flow as early as possible. While maintaining data integrity, the flow is rewritten to push operations closer to the data source, thus reducing data downstream for costly operations, such as joins.
Enable parallel processing. When running on a computer with multiple processors, this option allows the system to balance the load across those processors, which may result in faster performance. Use of multiple nodes or use of the following individual nodes may benefit from parallel processing: C5.0, Merge (by key), Sort, Bin (rank and tile methods), and Aggregate (using one or more key fields).
Generate SQL. This option pushes SQL processing back to the database. Note that turning this option on or off affects only the new flows that you create. You cannot switch the setting for an existing flow. For more information about using this option with flows, see SQL optimization.
- Database caching (SQL only). For flows that generate SQL to be run in the database, data can be cached mid flow to a temporary table in the database rather than to the file system. When combined with SQL optimization, this may result in significant gains in performance. For example, the output from a flow that merges multiple tables to create a data mining view may be cached and reused as needed. With database caching enabled, simply right-click any nonterminal node to cache data at that point, and the cache is automatically created directly in the database the next time the flow runs. This allows SQL to be generated for downstream nodes, further improving performance. Alternatively, this option can be disabled if needed, such as when policies or permissions preclude data being written to the database. If database caching or SQL optimization is not enabled, the cache will be written to the file system instead.
- Use relaxed conversion (SQL only). This option enables the conversion of data from either strings to numbers, or numbers to strings, if stored in a suitable format. For example, if the data is kept in the database as a string, but actually contains a meaningful number, the data can be converted for use when the pushback occurs.

Logging

Display SQL in the messages log at run time. Specifies whether SQL generated while running the flow is passed to the messages log.
Display SQL generation in the message log during preparation. During flow preview, specifies whether a preview of the SQL that would be generated is passed to the messages log.
SQL format Specifies whether any SQL that's displayed in the log should contain native SQL functions or standard ODBC functions of the form {fn FUNC(…)}, as generated by SPSS Modeler. The former relies on ODBC driver functionality that may not be implemented.
Reformat SQL for improved readability. Specifies whether SQL displayed in the log should be formatted for readability.
Show status for records. Specifies when records should be reported as they arrive at terminal nodes. Specify a number to use for updating the status every N records.

Parameters

You can define parameters for use in CLEM expressions and in scripting. They function as user-defined variables that are saved and persisted with the current flow, session, or SuperNode, and can be accessed from the user interface or through scripting. If you save a flow, for example, any parameters set for that flow are also saved. (This distinguishes them from local script variables, which can be used only in the script in which they are declared.) Parameters are often used in scripting to control the behavior of the script, by providing information about fields and values that don't need to be hard coded in the script.

If you set a parameter here in the flow properties, it's available to all nodes in the flow. Click Add value and enter the following information.

Name: Parameter names are listed here. For example, to create a parameter for the minimum temperature, you could type minvalue. Do not include the $P- prefix that denotes a parameter in CLEM expressions. This name is how the parameter is referenced in expressions.

Label: Lists a descriptive name for each parameter created.

Storage: Select a storage type from the list. Storage indicates how the data values are stored in the parameter. For example, when working with values containing leading zeros that you want to preserve (such as 008), you should select String as the storage type. Otherwise, the zeros will be stripped from the value. Available storage types are string, integer, real, time, date, and timestamp. Values for date parameters must be specified in ISO standard notation (YYYY-MM-DD).

Value: Lists the current value for each parameter. Adjust the parameter as required. Values for date parameters must be specified in ISO standard notation (YYYY-MM-DD). Dates specified in other formats aren't accepted.

Measure: Select the measurement level, which is used to describe characteristics of the parameter.

Prompt?: Select this option if you want the user to be prompted at runtime to enter a value for this parameter.

Globals

In the Globals tab of the flow properties, you can view the global values set for the current flow. Global values are created using a Set Globals node to determine statistics such as mean, sum, or standard deviation for selected fields.

After a Set Globals node runs, these values are then available for a variety of uses in flow operations.

You can't edit global values in the table here in the flow properties, but you can clear all global values for a flow using the button to the right of the table.

Messages

In the Messages tab of the flow properties, you can easily view messages regarding flow operations, such as running, optimization, and time elapsed for model building and evaluation. Error messages are also reported in this table.

Annotations

If you need to describe a flow to others in your organization, you can attach explanatory comments to flows, nodes, and model nuggets. Others can then view these comments on-screen, or you might even print out an image of the flow that includes the comments.

Use the Annotations tab of the flow properties to add text annotations to your flow. These notes are visible only when the Annotations tab is open, except that flow annotations can also be shown as on-screen comments.