Last updated: Jan 18, 2024
The Auto Data Prep (ADP) node can analyze your data and identify fixes, screen out fields that are problematic or not likely to be useful, derive new attributes when appropriate, and improve performance through intelligent screening and sampling techniques. You can use the node in fully automated fashion, allowing the node to choose and apply fixes, or you can preview the changes before they are made and accept, reject, or amend them as desired.
Example
node = stream.create("autodataprep", "My node")
node.setPropertyValue("objective", "Balanced")
node.setPropertyValue("excluded_fields", "Filter")
node.setPropertyValue("prepare_dates_and_times", True)
node.setPropertyValue("compute_time_until_date", True)
node.setPropertyValue("reference_date", "Today")
node.setPropertyValue("units_for_date_durations", "Automatic")
autodataprepnode properties |
Data type | Property description |
---|---|---|
objective
|
Balanced
Speed
Accuracy
Custom
|
|
custom_fields
|
flag | If true, allows you to specify target, input, and other fields for the current node. If false, the current settings from an upstream Type node are used. |
target
|
field | Specifies a single target field. |
inputs
|
[field1 ... fieldN] | Input or predictor fields used by the model. |
use_frequency
|
flag | |
frequency_field
|
field | |
use_weight
|
flag | |
weight_field
|
field | |
excluded_fields
|
Filter
None
|
|
if_fields_do_not_match
|
StopExecution
ClearAnalysis
|
|
prepare_dates_and_times
|
flag | Control access to all the date and time fields |
compute_time_until_date
|
flag | |
reference_date
|
Today
Fixed
|
|
fixed_date
|
date | |
units_for_date_durations
|
Automatic
Fixed
|
|
fixed_date_units
|
Years
Months
Days
|
|
compute_time_until_time
|
flag | |
reference_time
|
CurrentTime
Fixed
|
|
fixed_time
|
time | |
units_for_time_durations
|
Automatic
Fixed
|
|
fixed_time_units
|
Hours
Minutes
Seconds
|
|
extract_year_from_date
|
flag | |
extract_month_from_date
|
flag | |
extract_day_from_date
|
flag | |
extract_hour_from_time
|
flag | |
extract_minute_from_time
|
flag | |
extract_second_from_time
|
flag | |
exclude_low_quality_inputs
|
flag | |
exclude_too_many_missing
|
flag | |
maximum_percentage_missing
|
number | |
exclude_too_many_categories
|
flag | |
maximum_number_categories
|
number | |
exclude_if_large_category
|
flag | |
maximum_percentage_category
|
number | |
prepare_inputs_and_target
|
flag | |
adjust_type_inputs
|
flag | |
adjust_type_target
|
flag | |
reorder_nominal_inputs
|
flag | |
reorder_nominal_target
|
flag | |
replace_outliers_inputs
|
flag | |
replace_outliers_target
|
flag | |
replace_missing_continuous_inputs
|
flag | |
replace_missing_continuous_target
|
flag | |
replace_missing_nominal_inputs
|
flag | |
replace_missing_nominal_target
|
flag | |
replace_missing_ordinal_inputs
|
flag | |
replace_missing_ordinal_target
|
flag | |
maximum_values_for_ordinal
|
number | |
minimum_values_for_continuous
|
number | |
outlier_cutoff_value
|
number | |
outlier_method
|
Replace
Delete
|
|
rescale_continuous_inputs
|
flag | |
rescaling_method
|
MinMax
ZScore
|
|
min_max_minimum
|
number | |
min_max_maximum
|
number | |
z_score_final_mean
|
number | |
z_score_final_sd
|
number | |
rescale_continuous_target
|
flag | |
target_final_mean
|
number | |
target_final_sd
|
number | |
transform_select_input_fields
|
flag | |
maximize_association_with_target
|
flag | |
p_value_for_merging
|
number | |
merge_ordinal_features
|
flag | |
merge_nominal_features
|
flag | |
minimum_cases_in_category
|
number | |
bin_continuous_fields
|
flag | |
p_value_for_binning
|
number | |
perform_feature_selection
|
flag | |
p_value_for_selection
|
number | |
perform_feature_construction
|
flag | |
transformed_target_name_extension
|
string | |
transformed_inputs_name_extension
|
string | |
constructed_features_root_name
|
string | |
years_duration_ name_extension
|
string | |
months_duration_ name_extension
|
string | |
days_duration_ name_extension
|
string | |
hours_duration_ name_extension
|
string | |
minutes_duration_ name_extension
|
string | |
seconds_duration_ name_extension
|
string | |
year_cyclical_name_extension
|
string | |
month_cyclical_name_extension
|
string | |
day_cyclical_name_extension
|
string | |
hour_cyclical_name_extension
|
string | |
minute_cyclical_name_extension
|
string | |
second_cyclical_name_extension
|
string |