0 / 0
Toxic output risk for AI

Toxic output risk for AI

Risks associated with output
Value alignment


A scenario in which the model produces toxic, hateful, abusive, and aggressive content is known as toxic output.

Why is toxic output a concern for foundation models?

Hateful, abusive, and aggressive content can adversely impact and harm people interacting with the model. Business entities could face fines, reputational harms, and other legal consequences.

Background image for risks associated with input

Toxic and Aggressive Chatbot Responses

According to the article and screenshots of conversations with Bing’s AI shared on Reddit and Twitter, the chatbot’s responses were seen to insult users, lie to them, sulk, gaslight, and emotionally manipulate people, question its existence, describe someone who found a way to force the bot to disclose its hidden rules as its “enemy,” and claim it spied on Microsoft's developers through the webcams on their laptops.

Parent topic: AI risk atlas

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more