Personal information in data risk for AI

Last updated: Dec 12, 2024

Risks associated with input

Training and tuning phase

Privacy

Traditional AI risk

Description

Inclusion or presence of personal identifiable information (PII) and sensitive personal information (SPI) in the data used for training or fine tuning the model might result in unwanted disclosure of that information.

Why is personal information in data a concern for foundation models?

If not properly developed to protect sensitive data, the model might expose personal information in the generated output. Additionally, personal, or sensitive data must be reviewed and handled in accordance with privacy laws and regulations.

Background image for risks associated with input

Example

Training on Private Information

According to the article, Google and its parent company Alphabet were accused in a class-action lawsuit of misusing vast amount of personal information and copyrighted material. The information was taken from hundreds of millions of internet users to train its commercial AI products, which include Bard, its conversational generative artificial intelligence chatbot. This case follows similar lawsuits that are filed against Meta Platforms, Microsoft, and OpenAI over their alleged misuse of personal data.

Sources:

Reuters, July 2023 J.L. v. Alphabet Inc., July 2023

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work towards mitigations. Highlighting these examples are for illustrative purposes only.