The faithfulness metric measures how grounded the model output is in the model context and provides attributions from the context to show the most important sentences that contribute to the model output. The attributions are provided when the metric is calculated with fine-tuned models only.
Metric details
Faithfulness is an answer quality metric for generative AI quality evaluations that can help measure the quality of model answers. Answer quality metrics are calculated with LLM-as-a-judge models.
Scope
The faithfulness metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks: Retrieval Augmented Generation (RAG)
- Supported languages: English
Scores and values
The faithfulness metric score indicates how grounded the model output is in the model context. Higher scores indicate that the output is more grounded and less hallucinated.
- Range of values: 0.0-1.0
- Best possible score: 1.0
Settings
- Thresholds:
- Lower limit: 0
- Upper limit: 1
Parent topic: Evaluation metrics