0 / 0

Creating a prompt-tuning experiment

Last updated: May 01, 2025
Creating a prompt-tuning experiment

Create a prompt-tuning experiment that you can run to prompt tune a foundation model.

Prerequisite task: Tuning Studio

To continue prompt tuning a foundation model, complete the following steps:

  1. Choose how to initialize the prompt from the following options:

    Text
    Uses text that you specify.
    Random
    Uses values that are generated for you as part of the tuning experiment.

    These options are related to the prompt tuning method for tuning models. For more information about how each option affects the tuning experiment, see How prompt tuning works.

  2. Required for the text initialization method only: Add the initialization text that you want to include with the prompt.

    • For a classification task, give an instruction that describes what you want to classify and lists the class labels to be used. For example, Classify whether the sentiment of each comment is Positive or Negative.
    • For a generative task, describe what you want the model to provide in the output. For example, Make the case for allowing employees to work from home a few days a week.
    • For a summarization task, give an instruction such as, Summarize the main points from a meeting transcript.
  3. Choose a task type.

    Choose the task type that most closely matches what you want the model to do:

    Classification
    Predicts categorical labels from features. For example, given a set of customer comments, you might want to label each statement as a question or a problem. By separating out customer problems, you can find and address them more quickly. This task type handles single-label classification.
    Generation
    Generates text. For example, writes a promotional email.
    Summarization
    Generates text that describes the main ideas that are expressed in a body of text. For example, summarizes a research paper.

    Whichever task you choose, the input is submitted to the underlying foundation model as a generative request type during the experiment. Extra information that you share about your goal is used in classification tasks, for example, where class names you specify are taken into account in the prompts that are used to tune the model.

  4. Required for classification tasks only: In the Classification output field, add the class labels that you want the model to use one at a time.

    Important: Specify the same labels that are used in your training data.

    During the tuning experiment, class label information is submitted along with the input examples from the training data.

  5. Optional: If you want to change how your training samples are formatted when they are submitted to the foundation model during a tuning experiment, click Yes, edit in the Do your prompts need special formatting? section.

    For more information, see Editing the verbalizer for prompt tuning.

  6. Add the training data that will be used to tune the model. You can upload a file or use an asset from your project.

    To see examples of how to format your file, expand What should your data look like?, and then click Preview template. For more information, see Data formats.

  7. Optional: If you want to change the token size of the examples that are used during training, expand What should your data look like? to make adjustments.

    For more information, see Setting prompt-tuning token limits.

  8. Optional: Click Configure parameters to edit the parameters that are used by the tuning experiment.

    The tuning run is configured with parameter values that represent a good starting point for tuning a model. You can adjust them if you want.

    For more information about the available parameters and what they do, see Tuning parameters.

    After you change parameter values, click Save.

  9. Click Start tuning.

The tuning experiment begins. It might take a few minutes to a few hours depending on the size of your training data and the availability of compute resources. When the experiment is finished, the status shows as completed.

A tuned model asset is not created until after you create a deployment from a completed tuning experiment. For more information, see Deploying a tuned model.

Editing the verbalizer for prompt tuning

You can edit the verbalizer that is used for foundation model tuning.

The verbalizer for prompt tuning has the following format:

Input: {{input}} Output:

The default format adds the word Input: before the input text and adds Output: after the input text to show the foundation model where to add generated text.

You might want to customize the verbalizer if more descriptive prefix text can guide the foundation model to generate better answers. To tune a foundation model to summarize articles, for example, you can change the verbalizer as follows:

Article: {{input}} Summary: 

By using the prefixes Article and Summary instead of the generic Input and Output, you give the foundation model more contextual information about the input and expected output.

For foundation models that are designed for chat use cases, the default verbalizer has a simpler format:

{{input}}

Follow these guidelines when you change the verbalizer:

  • Only change the verbalizer after prompt engineering to validate that the custom format improves foundation model output.

  • Do not edit the {{input}} variable.

    This variable instructs the tuning experiment to extract text from the input segment of the eamples in your training data file.

  • For classification tasks, the class labels that you specify are passed to the foundation model automatically. You don't need to specify them in the custom verbalizer.

    However, if prompt engineering shows that using prefixes to represent the information that is being classified generates better output, you can change the verbalizer to use those prefixes. For a task where you want to classify customer feedback, you might use Feedback: {{input}} Class:, for example.

  • If you change the verbalizer that is used to tune a foundation model, use the same prefixes when you inference the tuned model later.

    For example, if the custom verbalizer is Article: {{input}} Summary:, then specify your prompt as follows when you inference the tuned foundation model from freeform mode in the Prompt Lab:

    Article: IBM watsonx Challenge empowers partners to solve real-world problems with AI. In June, IBM invited ecosystem partners in Europe, the Middle East and Africa to participate in an IBM watsonx Challenge, a hands-on experience designed to bring the watsonx platform capabilities to some of the most important members of the IBM ecosystem.These ecosystem partners, who sell, build or service IBM technologies, enthusiastically embraced the challenge. Participants formed teams and focused on quickly crafting a solution to one of three selected challenges. The challenges included using prompt engineering to analyze customer experience by using IBM® watsonx.ai, automating repetitive manual tasks to improve productivity by using IBM watsonx Orchestrate, and building a generative AI-powered virtual assistant by using IBM watsonx™ Assistant and IBM watsonx™ Discovery. This experience enabled passionate learners to experience exciting new generative AI technologies firsthand and it led to innovation and creativity.
    Summary: 
    

    From structured mode, change the placeholder prefixes in the Try section. Change Input to Article and change Output to Summary, for example.

Setting prompt-tuning token limits

For natural language models, words are converted to tokens. 256 tokens is equal to approximately 130—170 words. 128 tokens is equal to approximately 65—85 words. However, token numbers are difficult to estimate and can differ by model. For more information, see Tokens and tokenization.

You can change the number of tokens that are allowed in the model input and output during a prompt-tuning experiment.

Table 1: Token number parameters
Parameter name Default value Value options Value options for flan-t5-xl-3b only
Maximum input tokens 256 1–1024 1–256
Maximum output tokens 128 1–512 1–128

The larger the number of allowed input and output tokens, the longer it takes to tune the model. Use the smallest number of tokens in your examples that is possible to use but still represent your use case properly.

You already have some control over the input size. The input text that is used during a tuning experiment comes from your training data. So, you can manage the input size by keeping your example inputs to a set length. However, you might be getting uncurated training data from another team or process. In that case, you can use the Maximum input tokens slider to manage the input size. If you set the parameter to 200 and the training data has an example input with 1,000 tokens, for example, the example is truncated. Only the first 200 tokens of the example input are used.

The Max output tokens value is important because it controls the number of tokens that the model is allowed to generate as output at training time. You can use the slider to limit the output size, which helps the model to generate concise output.

Tip: For classification tasks, minimizing the size of the output is a good way to force a generative model to return the class label only, without repeating the classification pattern in the output.

Learn more

Parent topic: Tuning Studio