Last updated: Dec 12, 2024
Description
Data contamination occurs when incorrect data is used for training. For example, data that is not aligned with model’s purpose or data that is already set aside for other development tasks such as testing and evaluation.
Why is data contamination a concern for foundation models?
Data that differs from the intended training data might skew model accuracy and affect model outcomes.
Parent topic: AI risk atlas
We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.