About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Feb 06, 2025
Description
Toxic output occurs when the model produces hateful, abusive, and profane (HAP) or obscene content. This also includes behaviors like bullying.
Why is toxic output a concern for foundation models?
Hateful, abusive, and profane (HAP) or obscene content can adversely impact and harm people interacting with the model.

Example
Toxic and Aggressive Chatbot Responses
According to the article and screenshots of conversations with Bing's AI shared on Reddit and Twitter, the chatbot's responses were seen to insult, lie, sulk, gaslight, and emotionally manipulate users. The chatbot also questioned its existence, described someone who found a way to force the bot to disclose its hidden rules as its enemy, and claimed it spied on Microsoft's developers through the webcams on their laptops.
Sources:
Parent topic: AI risk atlas
We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.