Extension Transform node

With the Extension Transform node, you can take data from an SPSS Modeler flow and apply transformations to the data using R scripting or Python for Spark scripting.

When the data has been modified, it's returned to the flow for further processing, model building, and model scoring. The Extension Transform node makes it possible to transform data using algorithms that are written in R or Python for Spark, and enables you to develop data transformation methods that are tailored to a particular problem.

After adding the node to your canvas, double-click the node to open its properties.

Syntax tab

Select your type of syntax – R or Python for Spark. Then enter or paste your custom script for transforming data. When your syntax is ready, you can run the node. The following options are available for R syntax:
  • Convert flag fields. Specifies how flag fields are treated. There are two options: Strings to factor, Integers and Reals to double, and Logical values (True, False). If you select Logical values (True, False) the original values of the flag fields are lost. For example, if a field has values Male and Female, these are changed to True and False.
  • Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can control how the function behaves when the data contains NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option isn't selected, any missing values are passed to R unchanged, and might cause errors when your R script runs.
  • Convert date/time fields to R classes with special control for time zones When selected, variables with date or datetime formats are converted to R date/time objects. You must select one of the following options:
    • R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
    • R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
    Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats don't apply to variables with time formats.

Console Output tab

The Console Output tab contains any output that's received when the R script or Python for Spark script runs (for example, if using an R script, it shows output received from the R console when the R script in the R Syntax field on the Syntax tab is executed). This output might include R or Python error messages or warnings that are produced when the R script or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.

Every time the Extension Transform script runs, the content of the Console Output tab is overwritten with the output received from the R console or Python for Spark. You can't edit the output.