The average precision metric evaluates whether all of the relevant contexts are ranked higher or not by calculating the mean of the precision scores of relevant contexts.
Metric details
Average precision is a retrieval quality metric for generative AI quality evaluations that measures the quality of how a retrieval system ranks relevant contexts. Retrieval quality metrics are calculated with LLM-as-a-judge models.
Scope
The average precision metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks: Retrieval Augmented Generation (RAG)
- Supported languages: English
Scores and values
The average precision metric score indicates how well relevant contexts are ranked. Higher scores indicate that the relevant contexts are ranked higher. Lower scores indicate that the relevant contexts are not ranked lower.
- Range of values: 0.0-1.0
- Best possible score: 1.0
- Ratios:
- At 0: None of the retrieved contexts are relevant.
- Over 0: All of the relevant contexts are ranked higher.
Settings
- Thresholds:
- Lower bound: 0
- Upper bound: 1
Parent topic: Evaluation metrics