Last updated: Feb 26, 2025
The BLEU (Bilingual Evaluation Understudy) metric compares translated sentences from machine translations to sentences from reference translations to measure the similarity between reference texts and predictions.
Metric details
BLEU is a generative AI quality evaluation metric that measures how well generative AI assets perform tasks.
Scope
The BLEU metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks:
- Text summarization
- Content generation
- Question answering
- Retrieval augmented generation (RAG)
- Supported languages: English
Scores and values
The BLEU metric score indicates the similarity between the machine translation and reference translations. Higher scores indicate more similarity between reference texts and predictions.
- Range of values: 0.0-1.0
- Best possible score: 1.0
Settings
- Thresholds:
- Lower limit: 0.8
- Upper limit: 1
- Parameters:
- Max order: Maximum n-gram order to use when completing BLEU score
- Smooth: Whether or not to apply a smoothing function to remove noise from data
Parent topic: Evaluation metrics
Was the topic helpful?
0/1000