Tokens and tokenization

Last updated: Oct 09, 2024

A token is a collection of characters that has semantic meaning for a model. Tokenization is the process of converting the words in your prompt into tokens.

You can monitor foundation model token usage in a project on the Environments page on the Manage tab.

Converting words to tokens and back again

Prompt text is converted to tokens before being processed by foundation models.

The correlation between words and tokens is complex:

Sometimes a single word is broken into multiple tokens
The same word might be broken into a different number of tokens, depending on context (such as: where the word appears, or surrounding words)
Spaces, newline characters, and punctuation are sometimes included in tokens and sometimes not
The way words are broken into tokens varies from language to language
The way words are broken into tokens varies from model to model

For a rough idea, a sentence that has 10 words could be 15 to 20 tokens.

The raw output from a model is also tokens. In the Prompt Lab in IBM watsonx.ai, the output tokens from the model are converted to words to be displayed in the prompt editor.

Example

The following image shows how this sample input might be tokenized:

Tomatoes are one of the most popular plants for vegetable gardens. Tip for success: If you select varieties that are resistant to disease and pests, growing tomatoes can be quite easy. For experienced gardeners looking for a challenge, there are endless heirloom and specialty varieties to cultivate. Tomato plants come in a range of sizes.

Visualization of tokenization

Notice a few interesting points:

Some words are broken into multiple tokens and some are not
The word "Tomatoes" is broken into multiple tokens at the beginning, but later "tomatoes" is all one token
Spaces are sometimes included at the beginning of a word-token and sometimes spaces are a token all by themselves
Punctuation marks are tokens

Token limits

Every model has an upper limit to the number of tokens in the input prompt plus the number of tokens in the generated output from the model (sometimes called context window length, context window, context length, or maximum sequence length.) In the Prompt Lab, an informational message shows how many tokens are used in a given prompt submission and the resulting generated output.

In the Prompt Lab, you use the Max tokens parameter to specify an upper limit on the number of output tokens the model should generate for your prompt. The maximum allowed value for the Max tokens parameter is 1024.

Parent topic: Foundation models