Foundation models sometimes generate output that is not factually accurate. If factual accuracy is important for your project, set yourself up for success by learning how and why these models might sometimes get facts wrong and how you can ground
generated output in correct facts.
Why foundation models get facts wrong
Copy link to section
Foundation models can get facts wrong for a few reasons:
Pre-training builds word associations, not facts
Pre-training data sets contain out-of-date facts
Pre-training data sets do not contain esoteric or domain-specific facts and jargon
Sampling decoding is more likely to stray from the facts
Pre-training builds word associations, not facts
Copy link to section
During pre-training, a foundation model builds up a vocabulary of words (tokens) encountered in the pre-training data sets. Also during pre-training, statistical relationships between those words become encoded
in the model weights.
For example, "Mount Everest" often appears near "tallest mountain in the world" in many articles, books, speeches, and other common pre-training sources. As a result, a pre-trained model will probably correctly complete
the prompt "The tallest mountain in the world is " with the output "Mount Everest."
These word associations can make it seem that facts have been encoded into these models too. For very common knowledge and immutable facts, you might have good luck generating factually accurate output using pre-trained foundation models with
simple prompts like the tallest-mountain example. However, it is a risky strategy to rely on only pre-trained word associations when using foundation models in applications where accuracy matters.
Pre-training data sets contain out-of-date facts
Copy link to section
Collecting pre-training data sets and performing pre-training runs can take a significant amount of time, sometimes months. If a model was pre-trained on a data set from several years ago, the model vocabulary and word associations encoded
in the model weights won't reflect current world events or newly popular themes. For this reason, if you submit the prompt "The most recent winner of the world cup of football (soccer) is " to a model pre-trained on information
a few years old, the generated output will be out of date.
Pre-training data sets do not contain esoteric or domain-specific facts and jargon
Copy link to section
Common foundation model pre-training data sets, such as The Pile (Wikipedia), contain hundreds of millions of documents. Given how famous Mount
Everest is, it's reasonable to expect a foundation model to have encoded a relationship between "tallest mountain in the world" and "Mount Everest". However, if a phenomenon, person, or concept is mentioned in only a
handful of articles, chances are slim that a foundation model would have any word associations about that topic encoded in its weights. Prompting a pre-trained model about information that was not in its pre-training data sets is unlikely
to produce factually accurate generated output.
Sampling decoding is more likely to stray from the facts
Copy link to section
Decoding is the process a model uses to choose the words (tokens) in the generated output:
Greedy decoding always selects the token with the highest probability
Sampling decoding selects tokens pseudo-randomly from a probability distribution
Greedy decoding generates output that is more predictable and more repetitive. Sampling decoding is more random, which feels "creative". If, based on pre-training data sets, the most likely words to follow "The tallest mountain
is " are "Mount Everest", then greedy decoding could reliably generate that factually correct output, whereas sampling decoding might sometimes generate the name of some other mountain or something that's not even a mountain.
How to ground generated output in correct facts
Copy link to section
Rather than relying on only pre-trained word associations for factual accuracy, provide context in your prompt text.
Use context in your prompt text to establish facts
Copy link to section
When you prompt a foundation model to generate output, the words (tokens) in the generated output are influenced by the words in the model vocabulary and the words in the prompt text. You can use your prompt text to boost factually accurate
word associations.
Example 1
Copy link to section
Here's a prompt to cause a model to complete a sentence declaring your favorite color:
My favorite color is
Copy to clipboardCopied to clipboard
Given that only you know what your favorite color is, there's no way the model could reliably generate the correct output.
Instead, a color will be selected from colors mentioned in the model's pre-training data:
If greedy decoding is used, whichever color appears most frequently with statements about favorite colors in pre-training content will be selected.
If sampling decoding is used, a color will be selected randomly from colors mentioned most often as favorites in the pre-training content.
Example 2
Copy link to section
Here's a prompt that includes context to establish the facts:
I recently painted my kitchen yellow, which is my favorite color.
My favorite color is
Copy to clipboardCopied to clipboard
If you prompt a model with text that includes factually accurate context like this, then the output the model generates will be more likely to be accurate.
For more examples of including context in your prompt, see these samples:
When you include context with the needed facts in your prompt, using greedy decoding is likely to generate accurate output. If you need some variety in the output, you can experiment with sampling decoding with low values for parameters like
Temperature, Top P, and Top K. However, using sampling decoding increases the risk of inaccurate output.
Retrieval-augmented generation
Copy link to section
The retrieval-augmented generation pattern scales out the technique of pulling context into prompts. If you have a knowledge base, such as process documentation in web pages, legal contracts in PDF files, a database of products for sale, a GitHub
repository of C++ code files, or any other collection of information, you can use the retrieval-augmented generation pattern to generate factually accurate output based on the information in that knowledge base.
Retrieval-augmented generation involves three basic steps:
Search for relevant content in your knowledge base
Pull the most relevant content into your prompt as context
Send the combined prompt text to the model to generate output
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.