Prompt tuning
Prompt tuning adjusts the content of the prompt that is passed to the model to guide the model to generate output that matches a pattern you specify. The underlying foundation model and its parameter weights are not changed. Only the prompt input is altered.
How prompt tuning works
Foundation models are sensitive to the input that you give them. Your input, or how you prompt the model, can introduce context that the model will use to tailor its generated output. Prompt engineering to find the right prompt often works well. However, it can be time-consuming, error-prone, and its effectiveness can be restricted by the context window length that is allowed by the underlying model.
Prompt tuning a model in the Tuning Studio applies machine learning to the task of prompt engineering. Instead of adding words to the input itself, prompt tuning is a method for finding a sequence of values that, when added as a prefix to the input text, improve the model's ability to generate the output you want. This sequence of values is called a prompt vector.
Normally, words in the prompt are vectorized by the model. Vectorization is the process of converting text to tokens, and then to numbers defined by the model's tokenizer to identify the tokens. Lastly, the token IDs are encoded, meaning they are converted into a vector representation, which is the input format that is expected by the embedding layer of the model. Prompt tuning bypasses the model's text-vectorization process and instead crafts a prompt vector directly. This changeable prompt vector is concatenated to the vectorized input text and the two are passed as one input to the embedding layer of the model. Values from this crafted prompt vector affect the word embedding weights that are set by the model and influence the words that the model chooses to add to the output.
To find the best values for the prompt vector, you run a tuning experiment. You demonstrate the type of output that you want for a corresponding input by providing the model with input and output example pairs in training data. With each training run of the experiment, the generated output is compared to the training data output. Based on what it learns from differences between the two, the experiment adjusts the values in the prompt vector. After many runs through the training data, the model finds the prompt vector that works best.
You can choose to start the training process by providing text that is vectorized by the experiment. Or you can let the experiment use random values in the prompt vector. Either way, unless the initial values are exactly right, they will be changed repeatedly as part of the training process. Providing your own initialization text can help the experiment reach a good result more quickly.
The result of the experiment is a tuned version of the underlying model. You submit input to the tuned model for inferencing and the model generates output that follows the tuned-for pattern.
For more information about the prompt-tuning process that is used in Tuning Studio, see Prompt-tuning workflow.
Prompt-tuning workflow
During the experiment, the tuning model repeatedly adjusts the structure of the prompt so that its predictions can get better over time.
The following diagram illustrates the steps that occur during a prompt-tuning experiment run.
The parts of the experiment flow that you can configure are highlighted with a user icon . These decision points correspond with experiment tuning parameters
that you control. See Parameters for tuning foundation models.
The diagram shows the following steps of the experiment:
-
Starts from the initialization method that you choose to use to initialize the prompt.
If the initialization method parameter is set to
text
, then you must add the initialization text. -
If specified, tokenizes the initialization text and converts it into a prompt vector.
-
Reads the training data, tokenizes it, and converts it into batches.
The size of the batches is determined by the batch size parameter.
-
Sends input from the examples in the batch to the foundation model for the model to process and generate output.
-
Compares the model's output to the output from the training data that corresponds to the training data input that was submitted. Then, computes the loss gradient, which is the difference between the predicted output and the actual output from the training data.
The experiment adjusts the prompt vector that is added to the input based on the computed loss of the model. When this adjustment occurs depends on how the Accumulation steps parameter is configured.
-
Adjustments are applied to the prompt vector that was initialized in Step 2. The degree to which the vector is changed is controlled by the Learning rate parameter. The edited prompt vector is added as a prefix to the input from the next example in the training data, and is submitted to the model as input.
-
The process repeats until all of the examples in all of the batches are processed.
-
The entire set of batches are processed again as many times as is specified in the Number of epochs parameter.
Learn more
- Parameters for tuning foundation models
- IBM Research blog post: What is prompt-tuning?
- Research paper: The Power of Scale for Parameter-Efficient Prompt Tuning
Parent topic: Methods for tuning foundation models