The Extension model nugget is generated and placed on your flow canvas after running the
Extension Model node, which contains your R script or Python for Spark script that defines the model
building and model scoring.
By default, the Extension model nugget contains the script that's used for model scoring, options
for reading the data, and any output from the R console or Python for Spark. Optionally, the
Extension model nugget can also contain various other forms of model output, such as graphs and text
output. After the Extension model nugget is generated and added to your flow canvas, an output node
can be connected to it. The output node is then used in the usual way within your flow to obtain
information about the data and models, and for exporting data in various formats.
Syntax tab
Copy link to section
R model scoring syntax. If using R, the R script that's used for model
scoring is displayed in this field. By default, this field is enabled but not editable. To edit the
Python model scoring script, click Edit.
Python model scoring syntax. If using Python for Spark, the Python script
that's used for model scoring is displayed in this field. By default, this field is enabled but not
editable. To edit the Python model scoring script, click Edit.
If you click Edit to make the scoring syntax field editable, you can then
edit your model scoring script by typing in the scoring syntax field. For example, you might want to
edit your model scoring script if you identify an error in your model scoring script after you have
run the Extension Model node to generate an Extension model nugget. Any changes you make to the
model scoring script in the Extension model nugget will be lost if you regenerate the model by
running the Extension Model node again.
Model Options tab
Copy link to section
Read Data Options. These options only apply to R, not Python for Spark.
With these options, you can specify how missing values, flag fields, and variables with date or
datetime formats are handled.
Read data in batches. If you're processing a large amount of data (that's
too big to fit into the R engine's memory, for example), use this option to break the data down into
batches that can be sent and processed individually. Specify the maximum number of data records to
include in each batch.
For both the Extension Transform node and the Extension model nugget, data
passes through the R script (in batch). For this reason, scripts for model scoring and process nodes
that run in either a Hadoop or database environment shouldn't include operations that span or
combine rows in the data, such as sorting or aggregation. This limitation is imposed to ensure that
data can be split up in a Hadoop environment, and during in-database mining. Extension Output and
Extension Model nodes don't have this limitation.
Convert flag fields. Specifies how flag fields are treated.
There are two options: Strings to factor, Integers and Reals to double, and
Logical values (True, False). If you select Logical values (True,
False) the original values of the flag fields are lost. For example, if a field has
values Male and Female, these are changed to True
and False.
Convert missing values to the R 'not available' value (NA).
When selected, any missing values are converted to the R NA value. The value NA is used by R to
identify missing values. Some R functions that you use might have an argument that can control how
the function behaves when the data contains NA. For
example, the function might allow you to choose to automatically exclude records that contain
NA. If this option isn't selected, any missing values
are passed to R unchanged, and might cause errors when your R script runs.
Convert date/time fields to R classes with special control for time
zones When selected, variables with date or datetime formats are converted to R
date/time objects. You must select one of the following options:
R POSIXct. Variables with date or datetime formats are converted to
R POSIXct objects.
R POSIXlt (list). Variables with date or datetime formats
are converted to R POSIXlt objects.
Note: The POSIX formats are advanced options. Use these options only if your
R script specifies that datetime fields are treated in ways that require these formats. The POSIX
formats don't apply to variables with time formats.
The options you select for the Convert flag fields, Convert
missing values to the R 'not available' value (NA), and Convert date/time
fields to R classes with special control for time zones controls aren't recognized when
the Extension model nugget runs against a database. When the node runs against a database, the
default values for these controls are used instead:
Convert flag fields is set to Strings to factor, Integers and
Reals to double
Convert missing values to the R 'not available' value (NA) is
selected
Convert date/time fields to R classes with special control for time zones
is not selected
Console Output tab
Copy link to section
The Console Output tab contains any output that's received when the R
script or Python for Spark script on the Syntax tab runs (for example, if
using an R script, it shows output received from the R console when the R script in the R
model scoring syntax field on the Syntax tab of the Extension
model nugget runs). This output includes any R or Python error messages or warnings that are
produced when the R or Python script runs, and any text output from the R console. The output can be
used, primarily, to debug the script.
Every time the model scoring script runs, the content of the Console
Output tab is overwritten with the output received from the R console or Python for
Spark. You can't edit the console output.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.