The experiment settings used for data imputation in time series experiments.
Data imputation methods
Apply one of these data imputation methods in experiment settings to supply missing values in a data set.
Imputation method | Description |
---|---|
FlattenIterative | Time series data is first flattened, then missing values are imputed with the Scikit-learn iterative imputer. |
Linear | Linear interpolation method is used to impute the missing value. |
Cubic | Cubic interpolation method is used to impute the missing value. |
Previous | Missing value is imputed with the previous value. |
Next | Missing value is imputed with the next value. |
Fill | Missing value is imputed by using user-specified value, or sample mean, or sample median. |
Input Settings
These commands are used to support data imputation for time series experiments in a notebook.
Name | Description | Value | DefaultValue |
---|---|---|---|
use_imputation | Flag for switching imputation on or off. | True or False | True |
imputer_list | List of imputer names (strings) to search. If a list is not specified, all the default imputers are searched. If an empty list is passed, all imputers are searched. | "FlattenIterative", "Linear", "Cubic", "Previous", "Fill", "Next" | "FlattenIterative", "Linear", "Cubic", "Previous" |
imputer_fill_type | Categories of "Fill" imputer | "mean"/"median"/"value" | "value" |
imputer_fill_value | A single numeric value to be filled for all missing values. Only applies when "imputer_fill_type" is specified as "value". Ignored if "mean" or "median" is specified for "imputer_fill_type. | (Negative Infinity, Positive Infinity) | 0 |
imputation_threshold | Threshold for imputation. The missing value ratio must not be greater than the threshold in one column. Otherwise, results in an error. | (0,1) | 0.25 |
Notes for use_imputation usage
-
If the
use_imputation
method is specified asTrue
and the input data has missing values:imputation_threshold
takes effect.- imputer candidates in
imputer_list
would be used to search for the best imputer. - If the best imputer is
Fill
,imputer_fill_type
andimputer_fill_value
are applied; otherwise, they are ignored.
-
If the
use_imputation
method is specified asTrue
and the input data has no missing values:imputation_threshold
is ignored.- imputer candidates in
imputer_list
are used to search for the best imputer. If the best imputer isFill
,imputer_fill_type
andimputer_fill_value
are applied; otherwise, they are ignored.
-
If the
use_imputation
method is specified asFalse
but the input data has missing values:use_imputation
is turned on with a warning, then the method follows the behavior for the first scenario.
-
If the
use_imputation
method is specified asFalse
and the input data has no missing values, then no further processing is required.
For example:
"pipelines": [
{
"id": "automl",
"runtime_ref": "hybrid",
"nodes": [
{
"id": "automl-ts",
"type": "execution_node",
"op": "kube",
"runtime_ref": "automl",
"parameters": {
"del_on_close": true,
"optimization": {
"target_columns": [2,3,4],
"timestamp_column": 1,
"use_imputation": true
}
}
}
]
}
]
Parent topic: Data imputation in AutoAI experiments