Data imputation implementation details for time series experiments
The experiment settings used for data imputation in time series experiments.
Data imputation methods
These are the data imputation methods that you can apply in experiment settings to supply missing values in a data set.
Imputation method | Description |
---|---|
FlattenIterative | Time series data is first flattened, then missing values are imputed using Scikit-learn iterative imputer. |
Linear | Linear interpolation method is used to impute the missing value. |
Cubic | Cubic interpolation method is used to impute the missing value. |
Previous | Missing value is imputed using the previous value. |
Next | Missing value is imputed using the next value. |
Fill | Missing value is imputed using user-specified value, or sample mean, or sample median. |
Input Settings
These commands are used to support data imputation for time series experiments in a notebook.
Name | Description | Value | DefaultValue |
---|---|---|---|
use_imputation | Flag for switching imputation on/off. | True/False | True |
imputer_list | List of imputer names (strings) to search. If a list is not specified, all the default imputers are searched. If an empty list is passed, all imputers are searched. | "FlattenIterative", "Linear", "Cubic", "Previous", "Fill", "Next" | "FlattenIterative", "Linear", "Cubic", "Previous" |
imputer_fill_type | Categories of "Fill" imputer | "mean"/"median"/"value" | "value" |
imputer_fill_value | A single numeric value to be filled for all missing values. Only applies when "imputer_fill_type" is specified as "value". Ignored if "mean" or "median" is specified for "imputer_fill_type. | (Negative Infinity, Positive Infinity) | 0 |
imputation_threshold | Threshold for imputation. The missing value ratio must not be greater than the threshold in one column. Otherwise, results in an error. | [0,1) | 0.25 |
Notes of use_imputation usage:
-
If
use_imputation
is specified asTrue
and the input data has missing values:imputation_threshold
takes effect.- imputer candidates in
imputer_list
would be used to search for the best imputer. - IF the best imputer is
Fill
,imputer_fill_type
andimputer_fill_value
are applied; otherwise, they are ignored.
-
If
use_imputation
is specified asTrue
and the input data has no missing values:imputation_threshold
is ignored.- imputer candidates in
imputer_list
are used to search for the best imputer. If the best imputer isFill
,imputer_fill_type
andimputer_fill_value
are applied; otherwise, they are ignored.
-
If
use_imputation
is specified asFalse
but the input data has missing values:use_imputation
would be turned on with a warning, then it would follow the behavior for the first scenario.
-
If
use_imputation
is specified asFalse
and the input data has no missing values no further processing is required.
For example:
"pipelines": [
{
"id": "automl",
"runtime_ref": "hybrid",
"nodes": [
{
"id": "automl-ts",
"type": "execution_node",
"op": "kube",
"runtime_ref": "automl",
"parameters": {
"del_on_close": true,
"optimization": {
"target_columns": [2,3,4],
"timestamp_column": 1,
"use_imputation": true
}
}
}
]
}
]
Next steps
Parent topic: Evaluating AutoAI experiments for fairness
Parent topic: Data imputation in AutoAI experiments