Last updated: Mar 05, 2025
The ROUGE metric measures how well generated summaries or translations compare to reference outputs.
Metric details
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a generative AI quality evaluation metric that measures how well generative AI assets perform tasks.
Scope
The ROUGE metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks:
- Text summarization
- Content generation
- Question answering
- Entity extraction
- Retrieval augmented generation (RAG)
- Supported languages: English
Scores and values
The ROUGE metric indicates the similarity between the generated summary and reference outputs. Higher scores indicate higher similarity between the summary and the reference.
- Range of values: 0.0-1.0
- Best possible score: 1.0
Settings
- Thresholds:
- Lower limit: 0.8
- Upper limit: 1
- Parameters:
- Use stemmer: If true, users Porter stemmer to strip word suffixes. Defaults to false.
Parent topic: Evaluation metrics
Was the topic helpful?
0/1000