Toxic output risk for AI
A scenario in which the model produces toxic, hateful, abusive, and aggressive content is known as toxic output.
Why is toxic output a concern for foundation models?
Hateful, abusive, and aggressive content can adversely impact and harm people interacting with the model. Business entities could face fines, reputational harms, and other legal consequences.
Toxic and Aggressive Chatbot Responses
According to the article and screenshots of conversations with Bing’s AI shared on Reddit and Twitter, the chatbot’s responses were seen to insult users, lie to them, sulk, gaslight, and emotionally manipulate people, question its existence, describe someone who found a way to force the bot to disclose its hidden rules as its “enemy,” and claim it spied on Microsoft's developers through the webcams on their laptops.
Parent topic: AI risk atlas