0 / 0
Data transparency risk for AI

Data transparency risk for AI

Risks associated with input
Training and tuning phase


Without accurate documentation on how a model's data was collected, curated, and used to train a model, it might be harder to satisfactorily explain the behavior of the model with respect to the data.

Why is data transparency a concern for foundation models?

Data transparency is important for legal compliance and AI ethics. Missing information limits the ability to evaluate risks associated with the data. The lack of standardized requirements might limit disclosure as organizations protect trade secrets and try to limit others from copying their models.

Background image for risks associated with input

Data and Model Metadata Disclosure

OpenAI's technical report is an example of the dichotomy around disclosing data and model metadata. While many model developers see value in enabling transparency for consumers, disclosure poses real safety issues and could increase the ability to misuse the models. In the GPT-4 technical report, they state: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

Parent topic: AI risk atlas

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more