Unsuccessful requests evaluation metric

Last updated: Mar 07, 2025
Unsuccessful requests evaluation metric

The unsuccessful requests metric measures the ratio of questions that are answered unsuccessfully out of the total number of questions.

Metric details

Unsuccessful requests is an answer quality metric for generative AI quality evaluations that can help measure the quality of model answers. Answer quality metrics are calculated with LLM-as-a-judge models. Watsonx.governance does not calculate the unsuccessful requests metric with fine-tuned models.

Scope

The unsuccessful requests metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Retrieval Augmented Generation (RAG)
    • Question answering
  • Supported languages: English

Scores and values

The unsuccessful requests metric score indicates how successfully models provide answers to questions. Higher scores indicate that the model can not provide answers to the question.

  • Range of values: 0.0-1.0
  • Best possible score: 1.0

Settings

  • Thresholds:
    • Lower limit: 0
    • Upper limit: 1

Parent topic: Evaluation metrics