Output metadata drift evaluation metric

Last updated: Mar 04, 2025

The output metadata drift metric measures the change in the distribution of the LLM output text metadata.

Metric details

Output metadata drift is a drift v2 evaluation metric that can help measure changes in your data over time to ensure consistent outcomes for your model.

The following types of LLM output text metadata are measured with the output metadata drift:

Character count: Total number of characters in the output text
Word count: Total number of words in the output text
Token count: Total number of tokens in the output text
Sentence count: Total number of sentences in the output text
Average word length: Average length of words in the output text
Total word length: Total length of words in the output text
Average sentence length: Average length of the sentences in the output text

Scope

The output metadata drift evaluates generative AI assets only.

Types of AI assets: Prompt templates
Generative AI tasks:
- Text summarization
- Text classification
- Content generation
- Question answering
Supported languages: English

Scores and values

The output metadata drift score indicates the change in distribution of the LLM output text metadataa.

Range of values: 0.0-1.0
Best possible score: 0.0
Ratios:
- At 0: No change is detected.
- Over 0: Increasing change is detected.

Evaluation process

Watsonx.governance calculates output metadata drift by measuring the change in distribution of the metadata columns. The output token count column, if present in the payload, is also used to compute the output metadata drift. You can also choose to specify any meta fields while adding records to the payload table. These meta fields are also used to compute the output metadata drift.

Do the math

The following binary logarithm formula is used to identify discrete numeric output metadata columns:

Binary logarithm formula is displayed

If the `distinct_values_count` is less than the binary logarithm of the `total_count`, the feature is identified as discrete.

For discrete output metadata columns, watsonx.governance uses the [Jensen Shannon distance](#jensen-shannon-distance) formula to calculate output metadata drift.

For continuous output metadata columns, watsonx.governance uses the [total variation distance](#total-variation-distance) and [overlap coefficient](#overlap-coefficient) formulas to calculate output metadata drift:

The following Jensen Shannon distance formula is used to calculate output metadata drift for discrete output metadata columns:

Jensen Shannon distance formula is displayed

Jensen Shannon Distance is the normalized form of Kullback-Leibler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.

KL Divergence is displayed is the KL Divergence.

The total variation distance and overlap coefficient formulas are used to calculate output metadata drift for continous output metadata columns.

Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:

Probability distribution formula is displayed

If the two distributions are equal, the total variation distance between them becomes 0.

The following formula is used to calculate total variation distance:

Total variation distance formula is displayed

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
is the difference between two consecutive 𝑥 samples.
is the value of the density function for production data at a 𝑥 sample.
is the value of the density function for baseline data for at a 𝑥 sample.

The explanation of formula denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.

The overlap coefficient is calculated by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. The following formula is used to calculate the overlap coefficient:

Overlap coefficient formula is displayed

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
is the difference between two consecutive 𝑥 samples.
is the value of the density function for production data at a 𝑥 sample.
is the value of the density function for baseline data for at a 𝑥 sample.

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000