Last updated: Jan 17, 2024
The Data Audit node provides a comprehensive first look at the data, including summary statistics, histograms and distribution for each field, as well as information on outliers, missing values, and extremes. Results are displayed in an easy-to-read matrix that can be sorted and used to generate full-size graphs and data preparation nodes.
Example
stream = modeler.script.stream()
sourcenode = stream.findByID("id46WRP1285C")
node = stream.createAt("dataaudit", "My node", 196, 100)
stream.link(sourcenode, node)
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("fields", ["Age", "Na", "K"])
node.setPropertyValue("display_graphs", True)
node.setPropertyValue("basic_stats", True)
node.setPropertyValue("advanced_stats", True)
node.setPropertyValue("median_stats", False)
node.setPropertyValue("calculate", ["Count", "Breakdown"])
node.setPropertyValue("outlier_detection_method", "std")
node.setPropertyValue("outlier_detection_std_outlier", 1.0)
node.setPropertyValue("outlier_detection_std_extreme", 3.0)
node.setPropertyValue("output_mode", "Screen")
dataauditnode properties |
Data type | Property description |
---|---|---|
custom_fields
|
flag | |
fields
|
[field1 … fieldN] | |
overlay
|
field | |
display_graphs
|
flag | Used to turn the display of graphs in the output matrix on or off. |
basic_stats
|
flag | |
advanced_stats
|
flag | |
median_stats
|
flag | |
calculate
|
Count
Breakdown
|
Used to calculate missing values. Select either, both, or neither calculation method. |
outlier_detection_method
|
std
iqr
|
Used to specify the detection method for outliers and extreme values. |
outlier_detection_std_outlier
|
number | If outlier_detection_method is std , specifies the number to
use to define outliers. |
outlier_detection_std_extreme
|
number | If outlier_detection_method is std , specifies the number to
use to define extreme values. |
outlier_detection_iqr_outlier
|
number | If outlier_detection_method is iqr , specifies the number to
use to define outliers. |
outlier_detection_iqr_extreme
|
number | If outlier_detection_method is iqr , specifies the number to
use to define extreme values. |
use_output_name
|
flag | Specifies whether a custom output name is used. |
output_name
|
string | If use_output_name is true, specifies the name to use. |
output_mode
|
Screen
File
|
Used to specify target location for output generated from the output node. |
output_format
|
Formatted (.tab)
Delimited (.csv)
HTML (.html)
Output (.cou) |
Used to specify the type of output. |
paginate_output
|
flag | When the output_format is HTML , causes the output to be
separated into pages. |
lines_per_page
|
number | When used with paginate_output , specifies the lines per page of
output. |
full_filename
|
string |