Setting properties for SPSS Modeler flows | IBM Cloud Pak for Data as a Service

Setting properties for flows

Last updated: Dec 20, 2024

Setting properties for SPSS Modeler flows

You can specify properties to apply to the current flow.

To set flow properties, click the Flow Properties icon .

You can configure the following properties.

Options

General

Maximum number of rows to show in Data Preview: When you preview the data for a node, you can specify the number of rows to show.
Limit members for nominal fields: The data type of the nominal (set) fields becomes Typeless when the number of members exceeds the maximum number of members that you set in Maximum members. This option is useful when you are working with large nominal fields. When the measurement level of a field is set to Typeless, its role is automatically set to None. Fields that are set to None aren't available for modeling.

Date/Time

Import date/time/timestamp as

Select whether to use a date and time format for storing data in date and time fields or whether to import them as string variables.

Use microseconds in timestamp fields

If you have timestamp data that is measured in microseconds, you can enable this option to use the more precise data in your flows. To enable the option, select this checkbox and String for the Import date/time/timestamp as setting.

Note: This option works only for connectors that support SQL pushback.

Date format

Select a date format to use for date storage fields or when strings are interpreted as dates by CLEM date functions.

Time format

Select a time format to use for time storage fields or when strings are interpreted as times by CLEM time functions.

Rollover days/mins

For time formats, select whether negative time differences are interpreted as referring to the previous day or hour.

Date baseline (1st Jan)

Select the baseline years (always 1 January) to be used by CLEM date functions that work with a single date.

2-digit dates start from

Specify the cutoff year to add century digits for years that are denoted with only 2 digits. For example, specifying 1930 as the cutoff year assumes that 05/11/02 is in the year 2002. The same setting will use the 20th century for dates after 30; thus 05/11/73 is assumed to be in 1973.

Time zone

Select how the time zone is chosen for use with the datetime_now CLEM expression.

If you select Server, the time zone is used from where the SPSS Modeler run-time is running (sometimes this time is the same as the Client option). Or if your flow uses data from a database and the supported database uses SQL pushback, the datetime_now expression uses the time of the database.
If you select Client, the time zone is used from the machine where SPSS Modeler is installed.
Alternatively, you can select any of the Coordinated Universal Time values for the time zone.

Number Formats

You can specify the number of decimal places to use when SPSS Modeler displays real numbers in standard, scientific, or currency display formats.

Optimization

You can use these settings to optimize flow performance.

Enable flow rewriting

Flow rewriting reorders the nodes in a flow behind the scenes for more efficient operation, without altering flow semantics.

Optimize CLEM expressions

This option enables the optimizer to search for CLEM expressions that can be preprocessed before the flow runs to increase the processing speed. For example, if you have an expression such as log(salary), the optimizer calculates the actual salary value and passes that on for processing. This option can be used to improve both SQL pushback and SPSS Modeler performance.

Optimize syntax execution

This method of flow rewriting increases the efficiency of operations that have more than one node that contains SPSS Statistics syntax. Optimization is achieved by combining the syntax commands into a single operation, instead of running each as a separate operation.

Optimize other execution

This method of flow rewriting increases the efficiency of operations that can't be delegated to the database. Optimization is achieved by reducing the amount of data in the flow as early as possible. The flow is rewritten to push operations closer to the data source while maintaining data integrity. This change reduces data downstream for costly operations, such as joins.

Enable parallel processing

When running on a computer with multiple processors, this option allows the system to balance the load across those processors, which can result in faster performance. Use of multiple nodes or use of the following individual nodes can benefit from parallel processing: C5.0, Merge (by key), Sort, Bin (rank and tile methods), and Aggregate (using one or more key fields).

Generate SQL

This option pushes SQL processing back to the database. Turning this option on or off affects only the new flows that you create. You cannot switch the setting for an existing flow. For more information about using this option with flows, see SQL optimization.
- Database caching (SQL only). For flows that generate SQL to be run in the database, data can be cached mid flow to a temporary table in the database rather than to the file system. When combined with SQL optimization, this option can result in significant gains in performance. For example, the output from a flow that merges multiple tables to create a data mining view may be cached and reused as needed. With database caching enabled, hover over any nonterminal node in your flow, then click the overflow menu and select Cache > Enable. Data is now cached at this node, and the cache is automatically created directly in the database the next time the flow runs. This allows SQL to be generated for downstream nodes, further improving performance. Alternatively, this option can be disabled if needed, such as when policies or permissions preclude data being written to the database. If database caching or SQL optimization is not enabled, the cache is written to the file system instead.
- Use relaxed conversion (SQL only). This option enables the conversion of data from either strings to numbers, or numbers to strings, if stored in a suitable format. For example, if the data is kept in the database as a string, but actually contains a meaningful number, the data can be converted for use when the pushback occurs.

Logging

Display SQL in the messages log at run time: Specifies whether SQL generated while running the flow is passed to the messages log.
Display SQL generation in the message log during preparation: During flow preview, specifies whether a preview of the SQL that would be generated is passed to the messages log.
SQL format: Specifies whether any SQL that's displayed in the log should contain native SQL functions or standard ODBC functions of the form {fn FUNC(…)}, as generated by SPSS Modeler. The former relies on ODBC driver functionality that may not be implemented.
Reformat SQL for improved readability: Specifies whether SQL displayed in the log should be formatted for readability.
Show status for records: Specifies when records should be reported as they arrive at terminal nodes. Specify a number to use for updating the status every N records.

Parameters

Parameters are user-defined variables that are saved and persisted with the current flow or SuperNode. Parameters are often used in scripting to control the behavior of the script, and they can be accessed from the user interface as well.

You can define parameters for use in CLEM expressions and in scripting. Parameters that are defined in the flow properties are available to all nodes in the flow. Parameters set for a SuperNode are not available outside of the SuperNode. If you save a flow, any parameters set for that flow are also saved.

For more information about parameters, see Flow and SuperNode parameters.

Click Add value and enter the following information for the new parameter:

Name: This name is how the parameter is referenced in expressions. For example, to create a parameter for a minimum temperature, you could enter minvalue.
When parameters are used in CLEM expressions, they are placed within single quotation marks, for example, '$P-minvalue'. Do not enter the $P- prefix. It denotes a parameter in CLEM expressions.
Label: Lists a descriptive name for each parameter created.
Storage: Storage indicates how the data values are stored in the parameter. For example, if values have leading zeros that you want to preserve (such as 008), select String as the storage type. Otherwise, the zeros are stripped from the value.
Value: Lists the current value for each parameter, which you can change as needed. Values for date parameters must be specified in ISO standard notation (YYYY-MM-DD).
Measure: Select the measurement level, which is used to describe characteristics of the parameter. You can change this value to reflect the way that you intend to use the parameter. For example, Typeless indicates that the parameter can have any value compatible with its storage.
Prompt?: Select this option if you want users to be prompted to enter a value for this parameter when they start the runtime. You can use this option where you might need to enter different values for the same parameter on different occasions.

Globals

In the Globals tab of the flow properties, you can view the global values set for the current flow. Global values are created using a Set Globals node to determine statistics such as mean, sum, or standard deviation for selected fields.

After a Set Globals node runs, these values become available for various uses in flow operations.

You can't edit global values in the table here in the flow properties, but you can clear all global values for a flow.

Annotations

If you need to describe a flow to others in your organization, you can attach explanatory comments to flows, nodes, and model nuggets. Others can then view these comments on-screen or even print an image of the flow that includes your comments.

Use the Annotations tab of the flow properties to add text annotations to your flow. These notes are visible only when the Annotations tab is open, except that flow annotations can also be shown as on-screen comments.