Data imputation implementation details for time series experiments

Last updated: Oct 09, 2024

The experiment settings used for data imputation in time series experiments.

Data imputation methods

Apply one of these data imputation methods in experiment settings to supply missing values in a data set.

Data imputation methods for classification and regression experiments
Imputation method	Description
FlattenIterative	Time series data is first flattened, then missing values are imputed with the Scikit-learn iterative imputer.
Linear	Linear interpolation method is used to impute the missing value.
Cubic	Cubic interpolation method is used to impute the missing value.
Previous	Missing value is imputed with the previous value.
Next	Missing value is imputed with the next value.
Fill	Missing value is imputed by using user-specified value, or sample mean, or sample median.

Input Settings

These commands are used to support data imputation for time series experiments in a notebook.

Data imputation methods for time series experiments
Name	Description	Value	DefaultValue
use_imputation	Flag for switching imputation on or off.	True or False	True
imputer_list	List of imputer names (strings) to search. If a list is not specified, all the default imputers are searched. If an empty list is passed, all imputers are searched.	"FlattenIterative", "Linear", "Cubic", "Previous", "Fill", "Next"	"FlattenIterative", "Linear", "Cubic", "Previous"
imputer_fill_type	Categories of "Fill" imputer	"mean"/"median"/"value"	"value"
imputer_fill_value	A single numeric value to be filled for all missing values. Only applies when "imputer_fill_type" is specified as "value". Ignored if "mean" or "median" is specified for "imputer_fill_type.	(Negative Infinity, Positive Infinity)	0
imputation_threshold	Threshold for imputation. The missing value ratio must not be greater than the threshold in one column. Otherwise, results in an error.	(0,1)	0.25

Notes for use_imputation usage

If the use_imputation method is specified as True and the input data has missing values:
- imputation_threshold takes effect.
- imputer candidates in imputer_list would be used to search for the best imputer.
- If the best imputer is Fill, imputer_fill_type and imputer_fill_value are applied; otherwise, they are ignored.
If the use_imputation method is specified as True and the input data has no missing values:
- imputation_threshold is ignored.
- imputer candidates in imputer_list are used to search for the best imputer. If the best imputer is Fill, imputer_fill_type and imputer_fill_value are applied; otherwise, they are ignored.
If the use_imputation method is specified as False but the input data has missing values:
- use_imputation is turned on with a warning, then the method follows the behavior for the first scenario.
If the use_imputation method is specified as False and the input data has no missing values, then no further processing is required.

For example:

"pipelines": [
      {
        "id": "automl",
        "runtime_ref": "hybrid",
        "nodes": [
          {
            "id": "automl-ts",
            "type": "execution_node",
            "op": "kube",
            "runtime_ref": "automl",
            "parameters": {
              "del_on_close": true,
              "optimization": {
	          "target_columns": [2,3,4],
	          "timestamp_column": 1,
	          "use_imputation": true
              }
            }
          }
        ]
      }
    ]

Parent topic: Data imputation in AutoAI experiments