Parameters for tuning foundation models
Tuning parameters configure the tuning experiments that you use to tune the foundation model.
Learn more about the steps that occur during a tuning experiment and how parameters that you can configure affect the process.
Prompt-tuning workflow
During the experiment, the tuning model repeatedly adjusts the structure of the prompt so that its predictions can get better over time.
The following diagram illustrates the steps that occur during a prompt-tuning experiment run. The parts of the experiment flow that you can configure are highlighted with a user icon . These decision points correspond with experiment tuning parameters that you control.
The diagram shows the following steps of the experiment:
-
Starts from the initialization method that you choose to use to initialize the prompt.
If the initialization method parameter is set to
text
, then you must add the initialization text. -
If specified, tokenizes the initialization text and converts it into a prompt vector.
-
Reads the training data, tokenizes it, and converts it into batches.
The size of the batches is determined by the batch size parameter.
-
Sends input from the examples in the batch to the foundation model for the model to process and generate output.
-
Compares the model's output to the output from the training data that corresponds to the training data input that was submitted. Then, computes the loss gradient, which is the difference between the predicted output and the actual output from the training data.
At some point, the experiment adjusts the prompt vector that is added to the input based on the performance of the model. When this adjustment occurs depends on how the Accumulation steps parameter is configured.
-
Adjustments are applied to the prompt vector that was initialized in Step 2. The degree to which the vector is changed is controlled by the Learning rate parameter. The edited prompt vector is added as a prefix to the input from the next example in the training data, and is submitted to the model as input.
-
The process repeats until all of the examples in all of the batches are processed.
-
The entire set of batches are processed again as many times as is specified in the Number of epochs parameter.
Default parameters for prompt tuning
The best hyperparameter values to use for a prompt-tuning experiment differ based on your data and use case.
The following table captures the parameter values to use as a starting point for prompt tuning a third-party foundation model.
Parameter name | Default value for flan-t5-xl-3b | Learn more |
---|---|---|
Initialization method | Random | Initializing prompt tuning |
Initialization text | None | Initializing prompt tuning |
Batch size | 16 | Segmenting the training data |
Accumulate steps | 16 | Segmenting the training data |
Learning rate | 0.3 | Managing the learning rate |
Number of epochs (number of training cycles) | 20 | Choosing the number of training runs to complete |
The default parameters that are used for prompt tuning the granite-13b-instruct-v2 foundation model are adjusted based on the type of task you want the tuned model to do.
The following table captures the parameter values to use as a starting point per supported task type for prompt tuning the granite-13b-instruct-v2 foundation model.
Parameter name | Default value for classification | Default value for generation | Default value for summarization | Learn more |
---|---|---|---|---|
Batch size | 8 | 16 | 8 | Segmenting the training data |
Accumulate steps | 32 | 16 | 1 | Segmenting the training data |
Learning rate | 0.0006 | 0.0002 | 0.0002 | Managing the learning rate |
Number of epochs (number of training cycles) | 20 | 20 | 40 | Choosing the number of training runs to complete |
Parameter descriptions
The following table describes the tuning parameters that you can customize.
Parameter name | Description | Value options | Learn more |
---|---|---|---|
Initialization method (prompt tuning) | Specifies how to initialize the prompt vector. | Random, Text | Initializing prompt tuning |
Initialization text (prompt tuning) | Text to use as the prompt for the first run of the experiment. | – | Initializing prompt tuning |
Batch size | Number of labeled examples to process at one time. | 1–16 | Segmenting the training data |
Accumulate steps | Number of batches to process before adjustments are made. | 1–128 | Segmenting the training data |
Learning rate | Determines the scope of the change to make when the model is adjusted. | 0.00001–0.5 | Managing the learning rate |
Number of epochs (number of training cycles) | Number of times to cycle through the training data. | 1–50 | Choosing the number of training runs to complete |
Segmenting the training data
When an experiment runs, the experiment first breaks the training data into smaller batches, and then trains on one batch at a time. Each batch must fit in GPU memory to be processed. To reduce the amount of GPU memory that is needed, you can configure the tuning experiment to postpone making adjustments until more than one batch is processed. Tuning runs on a batch and its performance metrics are calculated, but no adjustments are made immediately. Instead, the performance information is collected over some number of batches before the cumulative performance metrics are evaluated.
Use the following parameters to control how the training data is segmented:
Batch size Number of labeled examples (also known as samples) to process at one time.
For example, for a data set with 1,000 examples and a batch size of 10, the data set is divided into 100 batches of 10 examples each.
If the training data set is small, specify a smaller batch size to ensure that each batch has enough examples in it.
Accumulation steps: Number of batches to process before adjustments are made.
For example, if the data set is divided into 100 batches and you set the accumulation steps value to 10, then adjustments are made 10 times instead of 100 times.
Choosing the number of training runs to complete
The Number of epochs parameter specifies the number of times to cycle through the training data.
For example, with a batch size of 10 and a data set with 1,000 examples, one epoch must process 100 batches and make adjustments 100 times. If you set the number of epochs to 20, the model is passed through the data set 20 times, which means it processes a total of 2,000 batches during the tuning process.
The higher the number of epochs and bigger your training data, the longer it takes to tune a model.
Managing the learning rate
The learning rate parameter determines the scope of the change to make when the model is adjusted. The higher the number, the greater the change.
Initializing the prompt
When you create a prompt-tuning experiment, you can choose whether to specify your own text to serve as the initial prompt vector or let the experiment generate it for you. These new tokens start the training process either in random positions, or based on the embedding of a vocabulary or instruction that you specify in text. Studies show that as the size of the underlying model grows beyond 10 billion parameters, the initialization method that is used becomes less important.
The choice that you make when you create the tuning experiment customizes how the prompt is initialized.
Initialization method: Choose a method from the following options:
- Text: The Prompt Tuning method is used where you specify the initialization text of the prompt yourself.
- Random: The Prompt Tuning method is used that allows the experiment to add values that are chosen at random to include with the prompt.
Initialization text: The text that you want to add. Specify a task description or instructions similar to what you use for zero-shot prompting.
Learn more
Parent topic: Tuning a model