Prompting the granite-13b-chat-v2 foundation model from IBM

Last updated: Nov 27, 2024

Experiment with inferencing the IBM granite-13b-chat-v2 foundation model in watsonx.ai to see how this versatile foundation model can help you accomplish various tasks.

The granite-13b-chat-v2 foundation model is a 13 billion parameter large language model that is designed to help business users get work done. The IBM Granite series of foundation models are trained on enterprise-specialized datasets, which means the models are familiar with the specialized language and jargon from various industries and can more readily generate content that is grounded in relevant industry knowledge.

For more information about the granite-13b-chat-v2 foundation model, such as version number and instruction-tuning details, see the model card.

The granite-13b-chat-v2 foundation model is optimized for the following use cases:

Chat
Returning factual answers with the RAG pattern
Natural language processing tasks

Because the model can be used for different use cases, to get the best results you need to adjust the model parameters and system prompt.

Conversing with granite-13b-chat-v2

To get the best results when chatting with the granite-13b-chat-v2 foundation model, first follow these recommendations and then experiment to get the results that you want.

The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a conversational task.

Table 1. Recommended model parameters from prompting granite-13b-chat-v2 in a conversational task
Parameter	Recommended value or range	Explanation
Decoding	Sampling	Sampling decoding generates more creative text, which helps to add interest and personality to responses from the chatbot. However, it can also lead to unpredictable output. You can control the degree of creativity with the next set of model parameters.
Top P: 0.85 Top K: 50 Temperature: 0.7		These sampling decoding parameters all work together. The model selects a subset of tokens from which to choose the token to include in the output. The subset includes the 50 most-probable tokens (Top K) or the tokens that, when their probability scores are summed, reach a total score of 0.85 (Top P). The relatively low temperature value of 0.7 amplifies the difference in token scores. As a result the tokens that make the cut are typically the most probable tokens. • To increase the creativity and diversity of responses, increase the temperature value. • If the model hallucinates, lower the temperature value.
Repetition penalty	1.05	Set the penalty to this low value to prevent the chatbot from sounding robotic by repeating words or phrases.
Random seed	–	Only specify a value for this setting if you are testing something and want to remove randomness as a factor from the test. For example, if you want to change the temperature to see how that affects the output, submit the same input repeatedly and change only the temperature value each time. Also specify a number, such as `5`, as the random seed each time to eliminate random token choices from also affecting the model output. The number itself doesn't matter, as long as you specify the same number each time.
Max tokens	900	The maximum context window length for the granite-13b-chat-v2 foundation model, which includes both input and output tokens, is 8192. For more information about tokens, see Tokens and tokenization. With each follow-up question, the conversation history is included as part of the model prompt. The granite-13b-chat-v2 foundation model can typically sustain a conversation for up to 5 turns or until the input reaches 4,000 tokens in length.

For more information about the model parameters, see Model parameters for prompting.

To prompt the granite-13b-chat-v2 foundation model for a chat task, try these steps:

From the Prompt Lab in chat mode, choose the granite-13b-chat-v2 foundation model.

Chat mode has default prompt parameter values that are optimized for conversational exchanges, including a higher Max tokens value.
From the Model parameters panel, apply the recommended model parameter values from Table 1.
In chat mode, you can submit user input without formatting the input text.

Chat mode applies the recommended format to the prompt text for you. You can click the text icon to see how your text is formatted.
To submit the same prompt in freeform mode, you must set up the system prompt.

Add instructions. For example, the following instruction text was used to train the model, and therefore is familiar to the model.

You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.

To copy the recommended system prompt text, click the Copy to clipboard icon from the following code snippet.
```
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.
```
Start a conversation.

The optimal structure for a prompt that is used for a chat task follows. The prompt includes syntax that identifies the following segments of the prompt:
- <|system|>: Identifies the instruction, which is also known as the system prompt for the foundation model.
- <|user|>: The query text to be answered.
- <|assistant|>: A cue at the end of the prompt that indicates that a generated answer is expected.
When you submit prompts from the Prompt Lab in freeform mode, use the expected prompt format.
```
<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.
<|user|>
{PROMPT}
<|assistant|>
```

Ask follow-up questions to keep the conversation going.

The optimal structure for a prompt that is used for a chat with multiple dialog turns follows. If you submit prompts from the Prompt Lab in freeform mode, use this prompt format.

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup) with "Hello! I am Granite Chat, created by IBM. How can I help you today?". Please do not say anything else and do not start a conversation.
<|user|>
{ROUND1_PROMPT}
<|assistant|>
{MODEL_RESPONSE}
<|user|>
{ROUND2_PROMPT}
<|assistant|>

For another sample prompt that illustrates the chat use case, see Sample: Converse with granite-13b-chat-v2.

For more information about using chat mode in Prompt Lab, see Prompt Lab.

Tips for prompting granite-13b-chat-v2 for conversational tasks

In follow-up questions, avoid pronouns. For example, ask “What does the variable represent?” instead of “What does it represent?”
If you want the foundation model to generate a response in table format, explicitly ask in the prompt for the model to return a table that is generated in markdown.

Returning factual answers with the RAG pattern

To guide the granite-13b-chat-v2 foundation model to return factual answers, use the retrieval-augmented generation pattern. Retrieval-augmented generation grounds the input that you submit to the model with factual information about the topic to be discussed. For more information, see Retrieval-augmented generation (RAG).

When you want to return factual answers from the granite-13b-chat-v2 foundation model, follow these recommendations.

The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a retrieval-augmented generation task.

Table 2. Recommended model parameters from prompting granite-13b-chat-v2 in a RAG task
Parameter	Recommended value or range	Explanation
Decoding	Greedy	Greedy decoding chooses tokens from only the most-probable options, which is best when you want factual answers.
Repetition penalty	1	Use the lowest value. Repetition is acceptable when the goal is factual answers.
Max tokens	500	The model can answer the question as completely as possible. Remember that the maximum context window length for the granite-13b-chat-v2 foundation model, which includes both input and output tokens, is 8192. Keep your input, including the document that you add to ground the prompt, within that limit. For more information about tokens, see Tokens and tokenization.
Stopping criteria	<\|endoftext\|>	A helpful feature of the granite-13b-chat-v2 foundation model is the inclusion of a special token that is named <\|endoftext\|> at the end of each response. When some generative models return a response to the input in fewer tokens than the maximum number allowed, they can repeat patterns from the input. This model prevents such repetition by incorporating a reliable stop sequence for the prompt.

For more information about the model parameters, see Model parameters for prompting.

To prompt the granite-13b-chat-v2 foundation model for a retrieval-augmented generation task, try these steps:

Find reliable resources with factual information about the topic that you want the model to discuss and that you have permission to use. Copy an excerpt of the document or documents to a text editor or other tool where you can access it later.

For example, the resource might be product information from your own company website or product documentation.
From the Prompt Lab, open freeform mode so that you can structure your prompts. Choose the granite-13b-chat-v2 foundation model.
From the Model parameters panel, set the recommended model parameters from Table 2.
In your prompt, clearly define the system prompt, user input, and where the model's output should go.

For example, the following prompt structure can help the granite-13b-chat-v2 foundation model to return relevant information.
```
<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.
Answer Length: {ANSWER_LENGTH}
[Document]
{DOCUMENT1_TITLE}
{DOCUMENT1_CONTENT}
[End]
[Document]
{DOCUMENT2_TITLE}
{DOCUMENT2_CONTENT}
[End]
[Document]
{DOCUMENT3_TITLE}
{DOCUMENT3_CONTENT}
[End]
{QUERY}
<|assistant|>
```
Note: The start and end of the document content is denoted by the special tags [Document] and [End]. Use a similar syntax if you want to add special tags that identify content types or subsection headers in your prompts. When the granite-13b-chat-v2 foundation model was created, it was trained to handle the following special tags:
- <|system|>: Identifies the instruction, which is also known as the system prompt for the foundation model.
- <|user|>: The query text to be answered.
- <|assistant|>: A cue at the end of the prompt that indicates that a generated answer is expected.
Do not use the same <|tagname|> syntax for your custom tags or you might confuse the model.

If you do copy this prompt template, after you paste it into the Prompt Lab editor, replace the placeholder variables.

Table 2a: RAG template placeholder variables
Placeholder variable	Description	Examples
`{ANSWER_LENGTH}`	Optional. Defines the expected response length for the answer.	Options include (from shortest to longest answers): `single word`, `concise`, `narrative`
`{DOCUMENTn_TITLE}`	Title of the document from which the excerpt with factual information is taken. You can include content from more than one document.	Product Brochure
`{DOCUMENTn_CONTENT}`	Text excerpt with the factual information that you want the model to be able to discuss knowledgeably.	Text from a marketing brochure, product documentation, company website, or other trusted resource.
`{QUERY}`	Question to be answered factually.	A question about the topic that is discussed in the document.

Tip: Alternatively, you can define and use a prompt variable for the document so that the prompt can be reused and the content can be replaced dynamically each time. For more information, see Building reusable prompts.

Retrieval-augmented generation prompt example

The following prompt uses the granite-13b-chat-v2 foundation model to answer questions about prompt tuning.

Note: The document content is taken from the Methods for tuning foundation models topic.

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.
Answer length: concise
[Document]
How prompt-tuning works
Foundation models are sensitive to the input that you give them. Your input, or how you prompt the model, can introduce context that the model will use to tailor its generated output. Prompt engineering to find the right prompt often works well. However, it can be time-consuming, error-prone, and its effectiveness can be restricted by the context window length that is allowed by the underlying model.
Prompt-tuning a model in the Tuning Studio applies machine learning to the task of prompt engineering. Instead of adding words to the input itself, prompt-tuning is a method for finding a sequence of values that, when added as a prefix to the input text, improve the model's ability to generate the output you want. This sequence of values is called a prompt vector.
Normally, words in the prompt are vectorized by the model. Vectorization is the process of converting text to tokens, and then to numbers defined by the model's tokenizer to identify the tokens. Lastly, the token IDs are encoded, meaning they are converted into a vector representation, which is the input format that is expected by the embedding layer of the model. Prompt-tuning bypasses the model's text-vectorization process and instead crafts a prompt vector directly. This changeable prompt vector is concatenated to the vectorized input text and the two are passed as one input to the embedding layer of the model. Values from this crafted prompt vector affect the word embedding weights that are set by the model and influence the words that the model chooses to add to the output.
To find the best values for the prompt vector, you run a tuning experiment. You demonstrate the type of output that you want for a corresponding input by providing the model with input and output example pairs in training data. With each training run of the experiment, the generated output is compared to the training data output. Based on what it learns from differences between the two, the experiment adjusts the values in the prompt vector. After many runs through the training data, the model finds the prompt vector that works best.
[End]
How does prompt tuning work?
<|assistant|>

The granite-13b-chat-v2 foundation model generates an accurate summary of how prompt tuning works that is based on information from the document content that was provided with the input.

The output looks something like this:

Prompt tuning is a method for improving a model's ability to generate specific outputs by adjusting a sequence of values called a prompt vector. Instead of directly adding words to the input, prompt tuning crafts a prompt vector that directly influences the word embedding weights set by the model. This is achieved by vectorizing input text and encoding token IDs into a vector representation, which is then concatenated with the crafted prompt vector and passed as one input to the model's embedding layer. The experiment adjusts the values in the prompt vector based on the differences between the generated output and the desired output in training data examples. Over many runs through the training data, the model finds the prompt vector that works best.

NLP tasks

You can use the granite-13b-chat-v2 foundation model for natural language processing tasks such as classification and extraction. When the granite-13b-chat-v2 model was created, it was trained to perform these types of natural language processing tasks.

During the model's training, the prompts that were used to train the model followed a specific format per task type. When you use the model to perform one of these tasks, mimic the established format in the prompts that you submit.

Classification

To use the granite-13b-chat-v2 foundation model to classify information, follow these recommendations.

The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for a classification task.

Table 3. Recommended model parameters from prompting granite-13b-chat-v2 in a classification task
Parameter	Recommended value or range	Explanation
Decoding	Greedy	Greedy decoding chooses tokens from only the most-probable options, which is best when you want to classify text.
Repetition penalty	1	Use the lowest value. Repetition is expected.
Max tokens	varies	Use a value that covers the number of tokens in your longest class label, such as `5` or `10`. Limiting the tokens encourages the model to return only the appropriate class label and nothing else.
Stopping criteria	Add each supported class label as a stop sequence.	Adding the classes as stop sequences forces the model to stop generating text after a class is assigned to the input.

To prompt the granite-13b-chat-v2 foundation model for a classification task, try these steps:

Identify the classes or classification labels that you want the model to assign to the input. Be sure to list these class labels in the instruction segment of your prompt.

For example, if you want to classify customer product reviews as positive or negative, you might define two class labels: Postitive and Negative.
Collect two or three representative examples of the type of input text that you want the model to classify.
Work with the granite-13b-chat-v2 foundation model from the Prompt Lab in freeform mode so that you can structure your prompts.
From the Model parameters panel, set the recommended model parameters from Table 3.

In your prompt, clearly identify the system prompt, user input, and where the model's output should go.

For example, the following prompt structure was used when the granite-13b-chat-v2 foundation model was trained to classify text:

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
{INSTRUCTION}
Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
{INPUT_LABEL}:
{ICL_INPUT_1}
{OUTPUT_LABEL}:
{ICL_OUTPUT_1}
{INPUT_LABEL}:
{ICL_INPUT_2}
{OUTPUT_LABEL}:
{ICL_OUTPUT_2}
{INPUT_LABEL}:
{TEST_INPUT}
{OUTPUT_LABEL}:
<|assistant|>

You can use a similar structure to leverage the model's training. Simply replace the placeholder variables in the prompt template.

Table 3a: Classification template placeholder variables
Placeholder variable	Description	Examples
`{INSTRUCTION}`	Description of the task. Include a list of the classes that you want the model to assign to the input.	For each product review, indicate whether the review is Positive or Negative.
`{INPUT_LABEL}`	Short label for the text to be classified.	`Input`, `Customer review`, `Feedback`, `Comment`
`{OUTPUT_LABEL}`	Short label that represents the classification value.	`Class`
`{ICL_INPUT_N}`	Optional. Examples of input text to be classified. Add examples when you want to use a few-shot prompt to support in-context learning.	`The service representative did not listen to a word I said. It was a waste of my time.`
`{ICL_OUTPUT_N}`	Example outputs with class labels assigned to the corresponding input text examples.	`Positive`, `Negative`

Classification prompt example

The following prompt classifies feedback that customers share about support center personnel.

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
For each feedback, specify whether the content is Positive or Negative. Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
Feedback:
Carol, the service rep was so helpful. She answered all of my questions and explained things beautifully.
Class:
Positive
   
Feedback:
The service representative did not listen to a word I said. It was a waste of my time.
Class:
Negative
   
Feedback:
Carlo was so helpful and pleasant. He was able to solve a problem that I've been having with my software for weeks now.
Class:
<|assistant|>

The output that is generated by the granite-13b-chat-v2 foundation model when this prompt is submitted is Positive.

Extraction

To use the granite-13b-chat-v2 foundation model to extract information, follow these recommendations.

The following table lists the recommended model parameters for prompting the granite-13b-chat-v2 foundation model for an extraction task.

Table 4. Recommended model parameters from prompting granite-13b-chat-v2 in an extraction task
Parameter	Recommended value or range	Explanation
Decoding	Greedy	Greedy decoding chooses tokens from only the most-probable options, which is best when you want to extract text.
Max tokens	varies	Use a value that covers the number of tokens in the longest mention of the information type that you want to extract, such as `5` or `10`. Limiting the tokens encourages the model to return only the appropriate class label and nothing else.
Stopping criteria	Add each supported class label as a stop sequence.	Adding the classes as stop sequences forces the model to stop generating text after a class is assigned to the input.

To prompt the granite-13b-chat-v2 foundation model for an extraction task, try these steps:

Identify the information types that you want the model to extract from the input. Be sure to list these information type labels in the instruction segment of your prompt.

For example, if you want to extract key pieces of information from a company's US Securities and Exchange Commission 10-K form, you might identify an information type such as a Line Of Credit Facility Maximum Borrowing Capacity.
Collect 2 or 3 representative examples of input text with the type of information that you want the model to extract.
Work with the granite-13b-chat-v2 foundation model from the Prompt Lab in freeform mode so that you can structure your prompts.
From the Model parameters panel, set the recommended model parameters from Table 4.

Clearly identify the system prompt, user input, and where the model's output should go.

For example, the following prompt structure was used when the granite-13b-chat-v2 foundation model was trained to extract information from text:

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
{INSTRUCTION}
Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
{INPUT_LABEL}:
{ICL_INPUT_1}
{OUTPUT_LABEL}:
{ICL_OUTPUT_1}
{INPUT_LABEL}:
{ICL_INPUT_2}
{OUTPUT_LABEL}:
{ICL_OUTPUT_2}
{INPUT_LABEL}:
{TEST_INPUT}
{OUTPUT_LABEL}:
<|assistant|>

You can use a similar structure to leverage the model's training. Simply replace the placeholder variables in the prompt template.

Table 4a: Extraction template placeholder variables
Placeholder variable	Description
`{INSTRUCTION}`	Description of the task. Include a list of the information types that you want the model to extract from the input.
`{INPUT_LABEL}`	Short label for the text to be classified.
`{OUTPUT_LABEL}`	Short label that represents the extracted value.
`{ICL_INPUT_N}`	Optional. Examples of input text with information types to be extracted. Add examples when you want to use a few-shot prompt to support in-context learning.
`{ICL_OUTPUT_N}`	Example outputs with information types extracted from the corresponding inputs.

Extraction prompt example

The following prompt extracts the Line Of Credit Facility Maximum Borrowing Capacity value from a company's SEC 10-K form.

<|system|>
You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
<|user|>
Extract the Line Of Credit Facility Maximum Borrowing Capacity from the 10K sentences.
Your response should only include the answer. Do not provide any further explanation.
Here are some examples, complete the last one:
10K Sentence:
The credit agreement also provides that up to $500 million in commitments may be used for letters of credit.
Line Of Credit Facility Maximum Borrowing Capacity:
$500M

10K Sentence:
In March 2020, we upsized the Credit Agreement by $100 million, which matures July 2023, to $2.525 billion.
Line Of Credit Facility Maximum Borrowing Capacity:
$2.525B

10K Sentence:
We prepared our impairment test as of October 1, 2022 and determined that the fair values of each of our reporting units exceeded net book value by more than 50%. Among our reporting units, the narrowest difference between the calculated fair value and net book value was in our Principal Markets segment's Canada reporting unit, whose calculated fair value exceeded its net book value by 53%. Future developments related to macroeconomic factors, including increases to the discount rate used, or changes to other inputs and assumptions, including revenue growth, could reduce the fair value of this and/or other reporting units and lead to impairment. There were no goodwill impairment losses recorded for the nine months ended December 31, 2022. Cumulatively, the Company has recorded $469 million in goodwill impairment charges within its former EMEA ($293 million) and current United States ($176 million) reporting units. Revolving Credit Agreement In October 2021, we entered into a $3.15 billion multi-currency revolving credit agreement (the "Revolving Credit Agreement") for our future liquidity needs. The Revolving Credit Agreement expires, unless extended, in October 2026. Interest rates on borrowings under the Revolving Credit Agreement are based on prevailing market interest rates, plus a margin, as further described in the Revolving Credit Agreement. The total expense recorded by the Company for the Revolving Credit Agreement was not material in any of the periods presented. We may voluntarily prepay borrowings under the Revolving Credit Agreement without premium or penalty, subject to customary "breakage" costs. The Revolving Credit Agreement includes certain customary mandatory prepayment provisions. Interest on Debt Interest expense for the three and nine months ended December 31, 2022 was $27 million and $65 million, compared to $18 million and $50 million for the three and nine months ended December 31, 2021. Most of the interest for the pre-Separation period presented in the historical Consolidated Income Statement reflects the allocation of interest expense associated with debt issued by IBM from which a portion of the proceeds benefited Kyndryl.
Line Of Credit Facility Maximum Borrowing Capacity:
<|assistant|>

The output that is generated by the granite-13b-chat-v2 foundation model when this prompt is submitted is $3.15B.

Learn more

To learn more about the granite-13b-chat-v2 foundation model, read the following resources:

Parent topic: IBM foundation models