You can specify properties to apply to the current flow.
To set flow properties, click the Flow Properties icon .
You can configure the following properties.
Options
Copy link to section
General
Maximum number of rows to show in Data Preview
When you preview the data for a node, you can specify the number of rows to show.
Limit members for nominal fields
The data type of the nominal (set) fields becomes Typeless when the
number of members exceeds the maximum number of members that you set in Maximum
members. This option is useful when you are working with large nominal fields. When the
measurement level of a field is set to Typeless, its role is automatically
set to None. Fields that are set to None aren't
available for modeling.
Date/Time
Import date/time/timestamp as
Select whether to use a date and time format for storing data in date and time fields or whether
to import them as string variables.
Use microseconds in timestamp fields
If you have timestamp data that is measured in microseconds, you can enable this option to use
the more precise data in your flows. To enable the option, select this checkbox and
String for the Import date/time/timestamp as
setting.
Note: This option works only for connectors that support SQL pushback.
Date format
Select a date format to use for date storage fields or when strings are interpreted as dates by
CLEM date functions.
Time format
Select a time format to use for time storage fields or when strings are interpreted as times by
CLEM time functions.
Rollover days/mins
For time formats, select whether negative time differences are interpreted as referring to the
previous day or hour.
Date baseline (1st Jan)
Select the baseline years (always 1 January) to be used by CLEM date functions that work with a
single date.
2-digit dates start from
Specify the cutoff year to add century digits for years that are denoted with only 2 digits. For
example, specifying 1930 as the cutoff year assumes that 05/11/02 is in the year 2002. The same
setting will use the 20th century for dates after 30; thus 05/11/73 is assumed to be in 1973.
Time zone
Select how the time zone is chosen for use with the datetime_now CLEM
expression.
If you select Server, the time
zone is used from where the SPSS Modeler run-time is running
(sometimes this time is the same as the Client option). Or if your flow uses
data from a database and the supported database uses SQL pushback, the datetime_now
expression uses the time of the database.
If you select Client, the time zone is used from the machine where SPSS
Modeler is installed.
Alternatively, you can select any of the Coordinated Universal Time values for the time
zone.
Number Formats
You can specify the number of decimal places to use when SPSS Modeler displays real numbers in standard, scientific, or currency display formats.
Optimization
You can use these settings to optimize flow performance.
Enable flow rewriting
Flow rewriting reorders the nodes in a flow behind the scenes for more efficient operation,
without altering flow semantics.
Optimize CLEM expressions
This option enables the optimizer to search for CLEM expressions that can be preprocessed before
the flow runs to increase the processing speed. For example, if you have an expression such as
log(salary), the optimizer calculates the actual salary value and passes that on
for processing. This option can be used to improve both SQL pushback and SPSS Modeler performance.
Optimize syntax execution
This method of flow rewriting increases the efficiency of operations that have more than one
node that contains SPSS Statistics syntax. Optimization is achieved by combining the syntax commands
into a single operation, instead of running each as a separate operation.
Optimize other execution
This method of flow rewriting increases the efficiency of operations that can't be delegated to
the database. Optimization is achieved by reducing the amount of data in the flow as early as
possible. The flow is rewritten to push operations closer to the data source while maintaining data
integrity. This change reduces data downstream for costly operations, such as joins.
Enable parallel processing
When running on a computer with multiple processors, this option allows the system to balance
the load across those processors, which can result in faster performance. Use of multiple nodes or
use of the following individual nodes can benefit from parallel processing: C5.0, Merge (by key),
Sort, Bin (rank and tile methods), and Aggregate (using one or more key fields).
Generate SQL
This option pushes SQL processing back to the database. Turning this option on or off affects
only the new flows that you create. You cannot switch the setting for an existing flow. For more
information about using this option with flows, see SQL optimization.
Database caching (SQL only). For flows that generate SQL to be run in the
database, data can be cached mid flow to a temporary table in the database rather than to the file
system. When combined with SQL optimization, this option can result in significant gains in
performance. For example, the output from a flow that merges multiple tables to create a data mining
view may be cached and reused as needed. With database caching enabled, hover over any nonterminal
node in your flow, then click the overflow menu and
select Cache > Enable. Data is
now cached at this node, and the cache is automatically created directly in the database the next
time the flow runs. This allows SQL to be generated for downstream nodes, further improving
performance. Alternatively, this option can be disabled if needed, such as when policies or
permissions preclude data being written to the database. If database caching or SQL optimization is
not enabled, the cache is written to the file system instead.
Use relaxed conversion (SQL only). This option enables the conversion of
data from either strings to numbers, or numbers to strings, if stored in a suitable format. For
example, if the data is kept in the database as a string, but actually contains a meaningful number,
the data can be converted for use when the pushback occurs.
Logging
Display SQL in the messages log at run time
Specifies whether SQL generated while running the flow is passed to the messages log.
Display SQL generation in the message log during preparation
During flow preview, specifies whether a preview of the SQL that would be generated is passed to
the messages log.
SQL format
Specifies whether any SQL that's displayed in the log should contain native SQL functions or
standard ODBC functions of the form {fn FUNC(…)}, as generated by SPSS Modeler. The former relies on ODBC driver functionality that may not be
implemented.
Reformat SQL for improved readability
Specifies whether SQL displayed in the log should be formatted for readability.
Show status for records
Specifies when records should be reported as they arrive at terminal nodes. Specify a number to
use for updating the status every N records.
Parameters
Copy link to section
Parameters are user-defined variables that are saved and
persisted with the current flow or SuperNode. Parameters are often used in scripting to control
the behavior of the script, and they can be accessed from the user interface as
well.
You can define parameters for use in CLEM expressions and in
scripting. Parameters that are defined in the flow properties are available to all nodes in the
flow.Parameters set for a SuperNode are
not available outside of the SuperNode.If you save a flow, any parameters set for that flow are also
saved.
Click
Add value and enter the following information for the new parameter:
Name
This name is how the parameter is referenced in expressions. For example, to create a parameter
for a minimum temperature, you could enter minvalue.
When parameters are
used in CLEM expressions, they are placed within single quotation marks, for example,
'$P-minvalue'. Do not enter the $P- prefix. It denotes a parameter
in CLEM expressions.
Label
Lists a descriptive name for each parameter created.
Storage
Storage indicates how the data values are stored in the parameter. For example, if values have
leading zeros that you want to preserve (such as 008), select
String as the storage type. Otherwise, the zeros are stripped from the
value.
Value
Lists the current value for each parameter, which you can change as needed. Values for date
parameters must be specified in ISO standard notation (YYYY-MM-DD).
Measure
Select the measurement level, which is used to describe characteristics of the parameter. You
can change this value to reflect the way that you intend to use the parameter. For example,
Typeless indicates that the parameter can have any value compatible with its
storage.
Prompt?
Select this option if you want users to be prompted to enter a value for this parameter when
they start the runtime. You can use this option where you might need to enter different values for
the same parameter on different
occasions.
Globals
Copy link to section
In the Globals tab of the flow properties, you can view the global values
set for the current flow. Global values are created using a Set Globals node
to determine statistics such as mean, sum, or standard deviation for selected fields.
After a Set Globals node runs, these values become available for various
uses in flow operations.
You can't edit global values in the table here in the flow properties, but you can clear all
global values for a flow.
Annotations
Copy link to section
If you need to describe a flow to others in your organization, you can attach explanatory
comments to flows, nodes, and model nuggets. Others can then view these comments on-screen or even
print an image of the flow that includes your comments.
Use the Annotations tab of the flow properties to add text annotations to
your flow. These notes are visible only when the Annotations tab is open,
except that flow annotations can also be shown as on-screen comments.