Embedding drift evaluation metric

Last updated: Feb 12, 2025
Embedding drift evaluation metric

The embedding drift evaluation metric detects the percentage of records that are outliers when compared to the baseline data.

Metric details

Embedding drift is a drift v2 evaluation metric that can help measure changes in your data over time to ensure consistent outcomes for your model.

Scope

The embedding drift metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Text summarization
    • Text classification
    • Content generation
    • Entity extraction
    • Question answering
    • Retrieval Augmented Generation (RAG)
  • Supported languages: English

Evaluation process

You must provide embeddings with your baseline data when you enable the embeddings drift metric to generate evaluation results. Watsonx.governance builds an auto-encoder that processes the embeddings in your baseline data and computes pre-defined cosine and euclidean distance metrics for the model output. Watsonx.governance identifies the distribution of the distance metrics to set a threshold for outlier detection and detects drift if the distance metric value is higher than the threshold. For RAG tasks, the embeddings for all of the context columns in your model record are combined into a single vector to determine drift.

Do the math

The following formulas are used to calculate the embedding drift metric:

Cosine distance measures the difference between embedding vectors:

Cosine distance formula is displayed

Description of cosine distance formula is displayed

The cosine distance ranges between 0, which indicates identical vectors to 1, which indicates no correlation between the vectors, to 2, which indicates opposite vectors.

Euclidean distance is the shortest distance between embedding vectors in the euclidean space:

Euclidean distance formula is displayed

Description of euclidean distance formula is displayed

The euclidean distance ranges between 0, which indicates completely identical vectors, to infinity. However, for vectors that are normalized to have unit length, the maximum euclidean distance is the euclidean distance formula symbol is displayed.

Parent topic: Evaluation metrics