Foundation model parameters: decoding and stopping criteria

Last updated: Oct 09, 2024

You can specify parameters to control how the model generates output in response to your prompt.

Decoding

Decoding is the process a model uses to choose the tokens in the generated output.

Greedy decoding selects the token with the highest probability at each step of the decoding process. Greedy decoding produces output that closely matches the most common language in the model's pretraining data and in your prompt text, which is desirable in less creative or fact-based use cases. A weakness of greedy decoding is that it can cause repetitive loops in the generated output.

Sampling decoding is more variable, more random than greedy decoding. Variability and randomness is desirable in creative use cases. However, with greater variability comes the risk of nonsensical output. Sampling decoding selects tokens from a probability distribution at each step:

Temperature sampling refers to selecting a high- or low-probability next token.
Top-k sampling refers to selecting the next token randomly from a specified number, k, of tokens with the highest probabilities.
Top-p sampling refers to selecting the next token randomly from the smallest set of tokens for which the cumulative probability exceeds a specified value, p. (Top-p sampling is also called nucleus sampling.)

Table 1. Supported values, defaults, and usage notes for sampling decoding
Parameter	Supported values	Default	Use
Temperature	Floating-point number in the range 0.0 (same as greedy decoding) to 2.0 (maximum creativity)	0.7	Higher values lead to greater variability
Top K	Integer in the range 1 to 100	50	Higher values lead to greater variability
Top P	Floating-point number in the range 0.0 to 1.0	1.0	Higher values lead to greater variability

Random seed

When you submit the same prompt to a model multiple times with sampling decoding, you'll usually get back different generated text each time. This variability is the result of intentional pseudo-randomness built into the decoding process. Random seed refers to the number used to generate that pseudo-random behavior.

Supported values: Integer in the range 1 to 4 294 967 295
Default: Generated based on the current server system time
Use: To produce repeatable results, set the same random seed value every time.

Repetition penalty

If you notice the result generated for your chosen prompt, model, and parameters consistently contains repetitive text, you can try adding a repetition penalty.

Supported values: Floating-point number in the range 1.0 (no penalty) to 2.0 (maximum penalty)
Default: 1.0
Use: The higher the penalty, the less likely it is that the result will include repeated text.

Stopping criteria

You can affect the length of the output generated by the model in two ways: specifying stop sequences and setting Min tokens and Max tokens.

Stop sequences

A stop sequence is a string of one or more characters. If you specify stop sequences, the model will automatically stop generating output after one of the stop sequences you specify appears in the generated output. For example, one way to cause a model to stop generating output after just one sentence is to specify a period as a stop sequence. That way, after the model generates the first sentence and ends it with a period, output generation stops. Choosing effective stop sequences depends on your use case and the nature of the generated output you expect.

Supported values: 0 to 6 strings, each no longer than 40 tokens

Default: No stop sequence

Use:

Stop sequences will be ignored until after the number of tokens specified in the Min tokens parameter have been generated.
If your prompt includes examples of input-output pairs, ensure the sample output in the examples ends with one of the stop sequences.

Minimum and maximum new tokens

If you're finding the output from the model is too short or too long, try adjusting the parameters that control the number of generated tokens:

The Min tokens parameter controls the minimum number of tokens in the generated output
The Max tokens parameter controls the maximum number of tokens in the generated output

Supported values: Integer in the range 1 to 1024

Defaults:

Min tokens: 0
Max tokens: 20

Use:

Min tokens must be less than or equal to Max tokens.
If stop sequences are specified, text generation will stop after any stop sequence is generated, even if the number of generated tokens is less than the value specified for Max tokens.
Because the cost of using foundation models in IBM watsonx.ai is based on use, which is partly related to the number of tokens generated, specifying the lowest value for Max tokens that works for your use case is a cost-saving strategy.

Parent topic: Foundation models