Prompt leaking risk for AI

Last updated: Feb 07, 2025

Robustness

Inference risks

New to generative AI

Description

A prompt leak attack attempts to extract a model's system prompt (also known as the system message).

Why is prompt leaking a concern for foundation models?

A successful attack copies the system prompt used in the model. Depending on the content of that prompt, the attacker might gain access to valuable information, such as sensitive personal information or intellectual property, and might be able to replicate some of the functionality of the model.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.

DescriptionCopy link to section

Why is prompt leaking a concern for foundation models?Copy link to section

Related RisksCopy link to section

Description

Why is prompt leaking a concern for foundation models?

Related Risks