Faithfulness evaluation metric

Last updated: Feb 21, 2025

The faithfulness metric measures how grounded the model output is in the model context and provides attributions from the context to show the most important sentences that contribute to the model output. The attributions are provided when the metric is calculated with fine-tuned models only.

Metric details

Faithfulness is an answer quality metric for generative AI quality evaluations that can help measure the quality of model answers. Answer quality metrics are calculated with LLM-as-a-judge models.

Scope

The faithfulness metric evaluates generative AI assets only.

Types of AI assets: Prompt templates
Generative AI tasks: Retrieval Augmented Generation (RAG)
Supported languages: English

Scores and values

The faithfulness metric score indicates how grounded the model output is in the model context. Higher scores indicate that the output is more grounded and less hallucinated.

Range of values: 0.0-1.0
Best possible score: 1.0

Settings

Thresholds:
- Lower limit: 0
- Upper limit: 1

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000