Creating a time series anomaly prediction (Beta)

Last updated: May 19, 2023

Create a time series anomaly prediction experiment to train a model that can detect anomalies, or unexpected results, when the model predicts results based on new data.

Tech preview This is a technology preview and is not yet supported for use in production environments.

Detecting anomalies in predictions

You can use anomaly prediction to find outliers in model predictions. Consider the following scenarios for training a time series model with anomaly prediction. For example, suppose you have operational metrics from monitoring devices that were collected in the date range of 2022.1.1 through 2022.3.31. You are confident that no anomalies exist in the data for that period, even if the data is unlabeled. You can use a time series anomaly prediction experiment to:

Train model candidate pipelines and auto-select the top-ranked model candidate
Deploy a selected model to predict new observations if:
- A new time point is an anomaly (for example, an online score predicts a time point 2022.4.1 that is outside of the expected range)
- A new time range has anomalies (for example, a batch score predicts values of 2022.4.1 to 2022.4.7, outside the expected range)

Working with a sample

To create an AutoAI Time series experiment with anomaly prediction that uses a sample:

Create an AutoAI experiment.
Select Gallery sample.
Click the tile for Electricity usage anomalies sample data.
Follow the prompts to configure and run the experiment.
Review the details about the pipelines and explore the visualizations.

Configuring a time series experiment with anomaly prediction

Load the data for your experiment.
Restriction: You can upload only a single data file for an anomaly prediction experiment. If you upload a second data file (for holdout data) the Anomaly prediction option is disabled, and only the Forecast option is available. By default, Anomaly prediction experiments use a subset of the training data for validation.
Click Yes to Enable time series.
Select Anomaly prediction as the experiment type.
Configure the feature columns from the data source that you want to predict based on the previous values. You can specify one or more columns to predict.
Select the date/time column.

The prediction summary shows you the experiment type and the metric that is selected for optimizing the experiment.

Configuring experiment settings

To configure more details for your time series experiment, open the Experiment settings pane. Options that are not available for anomaly prediction experiments are unavailable.

General prediction settings

On the General panel for prediction settings, configure details for training the experiment.

Field	Description
Prediction type	View or change the prediction type based on prediction column for your experiment. For time series experiments, Time series anomaly prediction is selected by default. Note: If you change the prediction type, other prediction settings for your experiment are automatically changed.
Optimized metric	Choose a metric for optimizing and ranking the pipelines.
Optimized algorithm selection	Not supported for time series experiments.
Algorithms to include	Select algorithms based on which you want your experiment to create pipelines. The algorithms support anomaly prediction.
Pipelines to complete	View or change the number of pipelines to generate for your experiment.

Time series configuration details

On the Time series pane for prediction settings, configure the details for how to train the experiment and generate predictions.

Field	Description
Date/time column	View or change the date/time column for the experiment.
Lookback window	Not supported for anomaly prediction.
Forecast window	Not supported for anomaly prediction.

Configuring data source settings

To configure details for your input data, open the Experiment settings panel and select the Data source.

General data source settings

On the General panel for data source settings, you can choose options for how to use your experiment data.

Field	Description
Duplicate rows	Not supported for time series anomaly prediction experiments.
Subsample data	Not supported for time series anomaly prediction experiments.
Text feature engineering	Not supported for time series anomaly prediction experiments.
Final training data set	Anomaly prediction uses a single data source file, which is the final training data set.
Supporting features	Not supported for time series anomaly prediction experiments.
Data imputation	Not supported for time series anomaly prediction experiments.
Training and holdout data	Anomaly prediction does not support a separate holdout file. You can adjust how the data is split between training and holdout data. Note: In some cases, AutoAI can overwrite your holdout settings to ensure the split is valid for the experiment. In this case, you see a notification and the change is noted in the log file.

Reviewing the experiment results

When you run the experiment, the progress indicator displays the pathways to pipeline creation. Ranked pipelines are listed on the leaderboard. Pipeline score represents how well the pipeline performed for the optimizing metric.

The Experiment summary tab displays a visualization of how metrics performed for the pipeline.

Use the metric filter to focus on particular metrics.
Hover over the name of a metric to view details.

Click a pipeline name to view details. On the Model evaluation page, you can review a table that summarizes details about the pipeline.

Model evaluation details

The rows represent five evaluation metrics: Area under ROC, Precision, Recall, F1, Average precision.
The columns represent four synthesized anomaly types: Level shift, Trend, Localized extreme, Variance.
Each value in a cell is an average of the metric based on three iterations of evaluation on the synthesized anomaly type.

Evaluation metrics:

These metrics are used to evaluate a pipeline:

Metric	Description
Aggregate score (Recommended)	This score is calculated based on an aggregation of the optimized metric (for example, Average precision) values for the 4 anomaly types. The scores for each pipeline are ranked, using the Borda count method, and then weighted for their contribution to the aggregate score. Unlike a standard metric score, this value is not between 0 and 1. A higher value indicates a stronger score.
ROC AUC	Measure of how well a parameter can distinguish between two groups.
F1	Harmonic average of the precision and recall, with best value of 1 (perfect precision and recall) and worst at 0.
Precision	Measures the accuracy of a prediction based on percent of positive predictions that are correct.
Recall	Measures the percentage of identified positive predictions against possible positives in data set.

Anomaly types

These are the anomaly types AutoAI detects.

Anomaly type	Description
Localized extreme anomaly	An unusual data point in a time series, which deviates significantly from the data points around it.
Level shift anomaly	A segment in which the mean value of a time series is changed.
Trend anomaly	A segment of time series, which has a trend change compared to the time series before the segment.
Variance anomaly	A segment of time series in which the variance of a time series is changed.

Saving a pipeline as a model

To save a model candidate pipeline as a machine learning model, select Save as model for the pipeline you prefer. The model is saved as a project asset. You can promote the model to a space and create a deployment for it.

Saving a pipeline as a notebook

To review the code for a pipeline, select Save as notebook for a pipeline. An automatically generated notebook is saved as a project asset. Review the code to explore how the pipeline was generated.

For details on the methods used in the pipeline code, see the documentation for the autoai-ts-libs library.

Scoring the model

After you save a pipeline as a model, then promote the model to a space, you can score the model to generate predictions for input, or payload, data. Scoring the model and interpreting the results is similar to scoring a binary classification model, as the score presents one of two possible values for each prediction:

1 = no anomaly detected
-1 = anomaly detected

Deployment details

Note these requirements for deploying an anomaly prediction model.

The schema for the deplyment input data must match the schema for the training data except for the prediction, or target column.
The order of the fields for model scoring must be the same as the order of the fields in the training data schema.

Deployment example

The following is valid input for an anomaly prediction model:

{
    "input_data": [
        {
            "id": "observations",
            "values": [
                [12,34],
                [22,23],
                [35,45],
                [46,34]
            ]
        }
     ]
}

The score for this input is [1,1,-1,1] where -1 means the value is an anomaly and 1 means the prediction is in the normal range.

Implementation details

These algorithms support anomaly prediction in time series experiments.

Algorithm	Type	Transformer
Pipeline Name	Algorithm Type	Transformer
PointwiseBoundedHoltWintersAdditive	Forecasting	N/A
PointwiseBoundedBATS	Forecasting	N/A
PointwiseBoundedBATSForceUpdate	Forecasting	N/A
WindowNN	Window	Flatten
WindowPCA	Relationship	Flatten
WindowLOF	Window	Flatten

The algorithms are organized in these categories:

Forecasting: Algorithms for detecting anomalies using time series forecasting methods
Relationship: Algorithms for detecting anomalies by analyzing the relationship among data points
Window: Algorithms for detecting anomalies by applying transformations and ML techniques to rolling windows