0 / 0
Creating a time series anomaly prediction (Beta)
Last updated: Apr 05, 2024
Creating a time series anomaly prediction (Beta)

Create a time series anomaly prediction experiment to train a model that can detect anomalies, or unexpected results, when the model predicts results based on new data.

Tech preview This is a technology preview and is not yet supported for use in production environments.

Detecting anomalies in predictions

You can use anomaly prediction to find outliers in model predictions. Consider the following scenarios for training a time series model with anomaly prediction. For example, suppose you have operational metrics from monitoring devices that were collected in the date range of 2022.1.1 through 2022.3.31. You are confident that no anomalies exist in the data for that period, even if the data is unlabeled. You can use a time series anomaly prediction experiment to:

  • Train model candidate pipelines and auto-select the top-ranked model candidate
  • Deploy a selected model to predict new observations if:
    • A new time point is an anomaly (for example, an online score predicts a time point 2022.4.1 that is outside of the expected range)
    • A new time range has anomalies (for example, a batch score predicts values of 2022.4.1 to 2022.4.7, outside the expected range)

Working with a sample

To create an AutoAI Time series experiment with anomaly prediction that uses a sample:

  1. Create an AutoAI experiment.

  2. Select Resource hub sample.

    Select the Resource hub sample

  3. Click the tile for Electricity usage anomalies sample data.

  4. Follow the prompts to configure and run the experiment.

    Resource hub sample output

  5. Review the details about the pipelines and explore the visualizations.

Configuring a time series experiment with anomaly prediction

  1. Load the data for your experiment.

    Restriction: You can upload only a single data file for an anomaly prediction experiment. If you upload a second data file (for holdout data) the Anomaly prediction option is disabled, and only the Forecast option is available. By default, Anomaly prediction experiments use a subset of the training data for validation.
  2. Click Yes to Enable time series.

  3. Select Anomaly prediction as the experiment type.

  4. Configure the feature columns from the data source that you want to predict based on the previous values. You can specify one or more columns to predict.

  5. Select the date/time column.

The prediction summary shows you the experiment type and the metric that is selected for optimizing the experiment.

Configuring experiment settings

To configure more details for your time series experiment, open the Experiment settings pane. Options that are not available for anomaly prediction experiments are unavailable.

General prediction settings

On the General panel for prediction settings, configure details for training the experiment.

Field Description
Prediction type View or change the prediction type based on prediction column for your experiment. For time series experiments, Time series anomaly prediction is selected by default. Note: If you change the prediction type, other prediction settings for your experiment are automatically changed.
Optimized metric Choose a metric for optimizing and ranking the pipelines.
Optimized algorithm selection Not supported for time series experiments.
Algorithms to include Select algorithms based on which you want your experiment to create pipelines. The algorithms support anomaly prediction.
Pipelines to complete View or change the number of pipelines to generate for your experiment.

Time series configuration details

On the Time series pane for prediction settings, configure the details for how to train the experiment and generate predictions.

Field Description
Date/time column View or change the date/time column for the experiment.
Lookback window Not supported for anomaly prediction.
Forecast window Not supported for anomaly prediction.

Configuring data source settings

To configure details for your input data, open the Experiment settings panel and select the Data source.

General data source settings

On the General panel for data source settings, you can choose options for how to use your experiment data.

Field Description
Duplicate rows Not supported for time series anomaly prediction experiments.
Subsample data Not supported for time series anomaly prediction experiments.
Text feature engineering Not supported for time series anomaly prediction experiments.
Final training data set Anomaly prediction uses a single data source file, which is the final training data set.
Supporting features Not supported for time series anomaly prediction experiments.
Data imputation Not supported for time series anomaly prediction experiments.
Training and holdout data Anomaly prediction does not support a separate holdout file. You can adjust how the data is split between training and holdout data. Note: In some cases, AutoAI can overwrite your holdout settings to ensure the split is valid for the experiment. In this case, you see a notification and the change is noted in the log file.

Reviewing the experiment results

When you run the experiment, the progress indicator displays the pathways to pipeline creation. Ranked pipelines are listed on the leaderboard. Pipeline score represents how well the pipeline performed for the optimizing metric.

The Experiment summary tab displays a visualization of how metrics performed for the pipeline.

  • Use the metric filter to focus on particular metrics.
  • Hover over the name of a metric to view details.

Click a pipeline name to view details. On the Model evaluation page, you can review a table that summarizes details about the pipeline.

Model evaluation details

  • The rows represent five evaluation metrics: Area under ROC, Precision, Recall, F1, Average precision.
  • The columns represent four synthesized anomaly types: Level shift, Trend, Localized extreme, Variance.
  • Each value in a cell is an average of the metric based on three iterations of evaluation on the synthesized anomaly type.

Evaluation metrics:

These metrics are used to evaluate a pipeline:

Metric Description
Aggregate score (Recommended) This score is calculated based on an aggregation of the optimized metric (for example, Average precision) values for the 4 anomaly types. The scores for each pipeline are ranked, using the Borda count method, and then weighted for their contribution to the aggregate score. Unlike a standard metric score, this value is not between 0 and 1. A higher value indicates a stronger score.
ROC AUC Measure of how well a parameter can distinguish between two groups.
F1 Harmonic average of the precision and recall, with best value of 1 (perfect precision and recall) and worst at 0.
Precision Measures the accuracy of a prediction based on percent of positive predictions that are correct.
Recall Measures the percentage of identified positive predictions against possible positives in data set.

Anomaly types

These are the anomaly types AutoAI detects.

Anomaly type Description
Localized extreme anomaly An unusual data point in a time series, which deviates significantly from the data points around it.
Level shift anomaly A segment in which the mean value of a time series is changed.
Trend anomaly A segment of time series, which has a trend change compared to the time series before the segment.
Variance anomaly A segment of time series in which the variance of a time series is changed.

Saving a pipeline as a model

To save a model candidate pipeline as a machine learning model, select Save as model for the pipeline you prefer. The model is saved as a project asset. You can promote the model to a space and create a deployment for it.

Saving a pipeline as a notebook

To review the code for a pipeline, select Save as notebook for a pipeline. An automatically generated notebook is saved as a project asset. Review the code to explore how the pipeline was generated.

For details on the methods used in the pipeline code, see the documentation for the autoai-ts-libs library.

Scoring the model

After you save a pipeline as a model, then promote the model to a space, you can score the model to generate predictions for input, or payload, data. Scoring the model and interpreting the results is similar to scoring a binary classification model, as the score presents one of two possible values for each prediction:

  • 1 = no anomaly detected
  • -1 = anomaly detected

Deployment details

Note these requirements for deploying an anomaly prediction model.

  • The schema for the deplyment input data must match the schema for the training data except for the prediction, or target column.
  • The order of the fields for model scoring must be the same as the order of the fields in the training data schema.

Deployment example

The following is valid input for an anomaly prediction model:

{
    "input_data": [
        {
            "id": "observations",
            "values": [
                [12,34],
                [22,23],
                [35,45],
                [46,34]
            ]
        }
     ]
}

The score for this input is [1,1,-1,1] where -1 means the value is an anomaly and 1 means the prediction is in the normal range.

Implementation details

These algorithms support anomaly prediction in time series experiments.

Algorithm Type Transformer
Pipeline Name Algorithm Type Transformer
PointwiseBoundedHoltWintersAdditive Forecasting N/A
PointwiseBoundedBATS Forecasting N/A
PointwiseBoundedBATSForceUpdate Forecasting N/A
WindowNN Window Flatten
WindowPCA Relationship Flatten
WindowLOF Window Flatten

The algorithms are organized in these categories:

  • Forecasting: Algorithms for detecting anomalies using time series forecasting methods
  • Relationship: Algorithms for detecting anomalies by analyzing the relationship among data points
  • Window: Algorithms for detecting anomalies by applying transformations and ML techniques to rolling windows

Learn more

Saving an AutoAI generated notebook (Watson Machine Learning)

Parent topic: Building a time series experiment

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more