Improper data curation risk for AI

Last updated: Dec 12, 2024

Risks associated with input

Training and tuning phase

Value alignment

Amplified by generative AI

Description

Improper collection and preparation of training or tuning data includes data label errors and by using data with conflicting information or misinformation.

Why is improper data curation a concern for foundation models?

Improper data curation can adversely affect how a model is trained, resulting in a model that does not behave in accordance with the intended values. Correcting problems after the model is trained and deployed might be insufficient for guaranteeing proper behavior.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.