0 / 0
ROUGE evaluation metric
Last updated: Mar 05, 2025
ROUGE evaluation metric

The ROUGE metric measures how well generated summaries or translations compare to reference outputs.

Metric details

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a generative AI quality evaluation metric that measures how well generative AI assets perform tasks.

Scope

The ROUGE metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Text summarization
    • Content generation
    • Question answering
    • Entity extraction
    • Retrieval augmented generation (RAG)
  • Supported languages: English

Scores and values

The ROUGE metric indicates the similarity between the generated summary and reference outputs. Higher scores indicate higher similarity between the summary and the reference.

  • Range of values: 0.0-1.0
  • Best possible score: 1.0

Settings

  • Thresholds:
    • Lower limit: 0.8
    • Upper limit: 1
  • Parameters:
    • Use stemmer: If true, users Porter stemmer to strip word suffixes. Defaults to false.

Parent topic: Evaluation metrics