The embedding drift evaluation metric detects the percentage of records that are outliers when compared to the baseline data.
Metric details
Copy link to section
Embedding drift is a drift v2 evaluation metric that can help measure changes in your data over time to ensure consistent outcomes for your model.
Scope
Copy link to section
The embedding drift metric evaluates generative AI assets only.
Types of AI assets: Prompt templates
Generative AI tasks:
Text summarization
Text classification
Content generation
Entity extraction
Question answering
Retrieval Augmented Generation (RAG)
Supported languages: English
Evaluation process
Copy link to section
You must provide embeddings with your baseline data when you enable the embeddings drift metric to generate evaluation results. Watsonx.governance builds an auto-encoder that processes the embeddings in your baseline data and computes pre-defined
cosine and euclidean distance metrics for the model output. Watsonx.governance identifies the distribution of the distance metrics to set a threshold for outlier detection and detects drift if the distance metric value is higher than the
threshold. For RAG tasks, the embeddings for all of the context columns in your model record are combined into a single vector to determine drift.
Do the math
Copy link to section
The following formulas are used to calculate the embedding drift metric:
Cosine distance measures the difference between embedding vectors:
The cosine distance ranges between 0, which indicates identical vectors to 1, which indicates no correlation between the vectors, to 2, which indicates opposite vectors.
Euclidean distance is the shortest distance between embedding vectors in the euclidean space:
The euclidean distance ranges between 0, which indicates completely identical vectors, to infinity. However, for vectors that are normalized to have unit length, the maximum euclidean distance is the .