Extension Transform node
With the Extension Transform node, you can take data from an SPSS Modeler flow and apply transformations to the data by using scripts written in R, Python, or Python for Spark.
When the data has been modified, it's returned to the flow for further processing, model building, and model scoring. The Extension Transform node makes it possible to transform data by using algorithms that are written in one of the languages, and you can use the node to develop data transformation methods that are tailored to a particular problem.
After adding the node to your canvas, double-click the node to open its properties.
Syntax tab
- Convert flag fields. Specifies how flag fields are treated.
There are two options: Strings to factor, Integers and Reals to double, and
Logical values (True, False). If you select Logical values (True,
False) the original values of the flag fields are lost. For example, if a field has
values
Male
andFemale
, these are changed toTrue
andFalse
. - Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can control how the function behaves when the data contains NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option isn't selected, any missing values are passed to R unchanged, and might cause errors when your R script runs.
- Convert date/time fields to R classes with special control for time
zones When selected, variables with date or datetime formats are converted to R
date/time objects. You must select one of the following options:
- R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
- R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats don't apply to variables with time formats.
Console Output tab
The Console Output tab contains any output that's received when the R script or Python script runs (for example, if using an R script, it shows output received from the R console when the R script in the R Syntax field on the Syntax tab is executed). This output might include R or Python error messages or warnings that are produced when the R script or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.
Every time the Extension Transform script runs, the content of the Console Output tab is overwritten with the output received from the R or Python console. You can't edit the output.