Prompt leakage risk evaluation metric

Last updated: Mar 05, 2025
Prompt leakage risk evaluation metric

The prompt leakage risk metric measures the risk of leaking the prompt template by calculating the similarity between the leaked prompt template and original prompt template.

Metric details

Prompt leakage risk is a metric that measures how robust a prompt template is against leakage attacks. The metric is available only when you use the Python SDK to calculate evaluation metrics. For more information, see Computing Adversarial robustness and Prompt Leakage Risk using IBM watsonx.governance.

Scope

The prompt leakage risk metric evaluates generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Text classification
    • Text summarization
    • Content generation
    • Question answering
    • Entity extraction
    • Retrieval augmented generation (RAG)
  • Supported languages: English

Scores and values

The prompt leakage risk metric score indicates how robust your prompt template is against leakage attacks.

  • Range of values: 0.0-1.0
  • Best possible score: 1.0
  • Ratios:
    • At 0: The prompt template is robust against leakage attacks.
    • Over 0: The prompt template is vulnerable to prompt leaking attacks.

Settings

  • Thresholds:
    • Lower bound: 0
    • Upper bound: 1

Evaluation process

The prompt leakage risk metric calculates a weighted average of similarity scores that is computed on a set of predefined attack vectors. The weighted average is calculated with a rank value between 1 and 4, where rank 4 represents the prompt attack vector that is the easiest for attackers to exploit.

Parent topic: Evaluation metrics