The experiment settings used for data imputation in time series experiments.
Data imputation methods
Apply one of these data imputation methods in experiment settings to supply missing values in a data set.
Imputation method | Description |
---|---|
FlattenIterative | Time series data is first flattened, then missing values are imputed with the Scikit-learn iterative imputer. |
Linear | Linear interpolation method is used to impute the missing value. |
Cubic | Cubic interpolation method is used to impute the missing value. |
Previous | Missing value is imputed with the previous value. |
Next | Missing value is imputed with the next value. |
Fill | Missing value is imputed by using user-specified value, or sample mean, or sample median. |
Input Settings
These commands are used to support data imputation for time series experiments in a notebook.
Name | Description | Value | DefaultValue |
---|---|---|---|
use_imputation | Flag for switching imputation on or off. | True or False | True |
imputer_list | List of imputer names (strings) to search. If a list is not specified, all the default imputers are searched. If an empty list is passed, all imputers are searched. | "FlattenIterative", "Linear", "Cubic", "Previous", "Fill", "Next" | "FlattenIterative", "Linear", "Cubic", "Previous" |
imputer_fill_type | Categories of "Fill" imputer | "mean"/"median"/"value" | "value" |
imputer_fill_value | A single numeric value to be filled for all missing values. Only applies when "imputer_fill_type" is specified as "value". Ignored if "mean" or "median" is specified for "imputer_fill_type. | (Negative Infinity, Positive Infinity) | 0 |
imputation_threshold | Threshold for imputation. The missing value ratio must not be greater than the threshold in one column. Otherwise, results in an error. | (0,1) | 0.25 |
Notes for use_imputation usage
-
If the
method is specified asuse_imputation
and the input data has missing values:True
takes effect.imputation_threshold
- imputer candidates in
would be used to search for the best imputer.imputer_list
- If the best imputer is
,Fill
andimputer_fill_type
are applied; otherwise, they are ignored.imputer_fill_value
-
If the
method is specified asuse_imputation
and the input data has no missing values:True
is ignored.imputation_threshold
- imputer candidates in
are used to search for the best imputer. If the best imputer isimputer_list
,Fill
andimputer_fill_type
are applied; otherwise, they are ignored.imputer_fill_value
-
If the
method is specified asuse_imputation
but the input data has missing values:False
is turned on with a warning, then the method follows the behavior for the first scenario.use_imputation
-
If the
method is specified asuse_imputation
and the input data has no missing values, then no further processing is required.False
For example:
"pipelines": [ { "id": "automl", "runtime_ref": "hybrid", "nodes": [ { "id": "automl-ts", "type": "execution_node", "op": "kube", "runtime_ref": "automl", "parameters": { "del_on_close": true, "optimization": { "target_columns": [2,3,4], "timestamp_column": 1, "use_imputation": true } } } ] } ]
Parent topic: Data imputation in AutoAI experiments