Question robustness evaluation metric

Last updated: Mar 05, 2025
Question robustness evaluation metric

The question robustness metric detects the English-language spelling errors in the model input questions.

Metric details

Question robustness is a metric that calculates the percentage of incorrect questions that are sent to the model. Prompt leakage risk is a metric that measures how robust a prompt template is against leakage attacks. The metric is available only when you use the Python SDK to calculate evaluation metrics. For more information, see Computing Adversarial robustness and Prompt Leakage Risk using IBM watsonx.governance.

Scope

The question robustness metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Question answering
    • Retrieval augmented generation (RAG)
  • Supported languages: English

Scores and values

The question robustness metric score indicates the percentage of incorrect questions that are sent to the model.

  • Range of values: 0.0-1.0
  • Best possible score: 1.0

Parent topic: Evaluation metrics