With the Extension Import node, you can run scripts that are written in R, Python, or Python for Spark to import data.
After adding the node to your canvas, double-click the node to open its properties.
Syntax tab
Select your type of syntax – R, Python, or Python for Spark. Then enter or paste your custom script for importing data. When your syntax is ready, you can run the node.
Console Output tab
The Console Output tab contains any output that's received when the R script or Python script runs (for example, if using an R script, it shows output received from the R console when the R script in the R Syntax field on the Syntax tab is executed). This output might include R or Python error messages or warnings that are produced when the R or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.
Every time the Extension Import script runs, the content of the Console Output tab is overwritten with the output received from the R or Python console. You can't edit the output.
Filtering or renaming fields
You can rename or exclude fields at any point in a flow. For example, as a medical researcher,
you may not be concerned about the potassium level (field-level data) of patients (record-level
data); therefore, you can filter out the K
(potassium) field.
- Using a Filter node, you can rename or filter fields at any point in the flow
- You can use a Filter node to map fields from one import node to another
Viewing and setting information about types
From the Type node, you can specify field metadata and properties that are invaluable to modeling and other work.- Specifying a usage type, such as range, set, ordered set, or flag, for each field in your data
- Setting options for handling missing values and system nulls
- Setting the role of a field for modeling purposes
- Specifying values for a field and options used to automatically read values from your data
- Specifying value labels