0 / 0
Parameters for tuning foundation models
Last updated: Sep 26, 2024
Parameters for tuning foundation models

Tuning parameters configure the tuning experiments that you use to tune the foundation model.

Note: The parameters that you change when you tune a foundation model apply to the tuning experiment, not to the underlying foundation model.

Learn more about the steps that occur during a tuning experiment and how parameters that you can configure affect the process.

Prompt-tuning workflow

During the experiment, the tuning model repeatedly adjusts the structure of the prompt so that its predictions can get better over time.

The following diagram illustrates the steps that occur during a prompt-tuning experiment run. The parts of the experiment flow that you can configure are highlighted with a user icon user. These decision points correspond with experiment tuning parameters that you control.

Prompt-tuning experiment run process details

The diagram shows the following steps of the experiment:

  1. Starts from the initialization method that you choose to use to initialize the prompt.

    If the initialization method parameter is set to text, then you must add the initialization text.

  2. If specified, tokenizes the initialization text and converts it into a prompt vector.

  3. Reads the training data, tokenizes it, and converts it into batches.

    The size of the batches is determined by the batch size parameter.

  4. Sends input from the examples in the batch to the foundation model for the model to process and generate output.

  5. Compares the model's output to the output from the training data that corresponds to the training data input that was submitted. Then, computes the loss gradient, which is the difference between the predicted output and the actual output from the training data.

    At some point, the experiment adjusts the prompt vector that is added to the input based on the performance of the model. When this adjustment occurs depends on how the Accumulation steps parameter is configured.

  6. Adjustments are applied to the prompt vector that was initialized in Step 2. The degree to which the vector is changed is controlled by the Learning rate parameter. The edited prompt vector is added as a prefix to the input from the next example in the training data, and is submitted to the model as input.

  7. The process repeats until all of the examples in all of the batches are processed.

  8. The entire set of batches are processed again as many times as is specified in the Number of epochs parameter.

Note: No layer of the base foundation model is changed during this process.

Default parameters for prompt tuning

The best hyperparameter values to use for a prompt-tuning experiment differ based on your data and use case.

The following table captures the parameter values to use as a starting point for prompt tuning a third-party foundation model.

Table 1: Tuning parameter values for third-party foundation models
Parameter name Default value for flan-t5-xl-3b Learn more
Initialization method Random Initializing prompt tuning
Initialization text None Initializing prompt tuning
Batch size 16 Segmenting the training data
Accumulate steps 16 Segmenting the training data
Learning rate 0.3 Managing the learning rate
Number of epochs (number of training cycles) 20 Choosing the number of training runs to complete

The default parameters that are used for prompt tuning the granite-13b-instruct-v2 foundation model are adjusted based on the type of task you want the tuned model to do.

The following table captures the parameter values to use as a starting point per supported task type for prompt tuning the granite-13b-instruct-v2 foundation model.

Table 2: Tuning parameter values for the granite-13b-instruct-v2 foundation model
Parameter name Default value for classification Default value for generation Default value for summarization Learn more
Batch size 8 16 8 Segmenting the training data
Accumulate steps 32 16 1 Segmenting the training data
Learning rate 0.0006 0.0002 0.0002 Managing the learning rate
Number of epochs (number of training cycles) 20 20 40 Choosing the number of training runs to complete

Parameter descriptions

The following table describes the tuning parameters that you can customize.

Table 3: Tuning parameter value description references
Parameter name Description Value options Learn more
Initialization method (prompt tuning) Specifies how to initialize the prompt vector. Random, Text Initializing prompt tuning
Initialization text (prompt tuning) Text to use as the prompt for the first run of the experiment. Initializing prompt tuning
Batch size Number of labeled examples to process at one time. 1–16 Segmenting the training data
Accumulate steps Number of batches to process before adjustments are made. 1–128 Segmenting the training data
Learning rate Determines the scope of the change to make when the model is adjusted. 0.00001–0.5 Managing the learning rate
Number of epochs (number of training cycles) Number of times to cycle through the training data. 1–50 Choosing the number of training runs to complete

Segmenting the training data

When an experiment runs, the experiment first breaks the training data into smaller batches, and then trains on one batch at a time. Each batch must fit in GPU memory to be processed. To reduce the amount of GPU memory that is needed, you can configure the tuning experiment to postpone making adjustments until more than one batch is processed. Tuning runs on a batch and its performance metrics are calculated, but no adjustments are made immediately. Instead, the performance information is collected over some number of batches before the cumulative performance metrics are evaluated.

Use the following parameters to control how the training data is segmented:

Batch size Number of labeled examples (also known as samples) to process at one time.

For example, for a data set with 1,000 examples and a batch size of 10, the data set is divided into 100 batches of 10 examples each.

If the training data set is small, specify a smaller batch size to ensure that each batch has enough examples in it.

Accumulation steps: Number of batches to process before adjustments are made.

For example, if the data set is divided into 100 batches and you set the accumulation steps value to 10, then adjustments are made 10 times instead of 100 times.

Choosing the number of training runs to complete

The Number of epochs parameter specifies the number of times to cycle through the training data.

For example, with a batch size of 10 and a data set with 1,000 examples, one epoch must process 100 batches and make adjustments 100 times. If you set the number of epochs to 20, the model is passed through the data set 20 times, which means it processes a total of 2,000 batches during the tuning process.

The higher the number of epochs and bigger your training data, the longer it takes to tune a model.

Managing the learning rate

The learning rate parameter determines the scope of the change to make when the model is adjusted. The higher the number, the greater the change.

Initializing the prompt

When you create a prompt-tuning experiment, you can choose whether to specify your own text to serve as the initial prompt vector or let the experiment generate it for you. These new tokens start the training process either in random positions, or based on the embedding of a vocabulary or instruction that you specify in text. Studies show that as the size of the underlying model grows beyond 10 billion parameters, the initialization method that is used becomes less important.

The choice that you make when you create the tuning experiment customizes how the prompt is initialized.

Initialization method: Choose a method from the following options:

  • Text: The Prompt Tuning method is used where you specify the initialization text of the prompt yourself.
  • Random: The Prompt Tuning method is used that allows the experiment to add values that are chosen at random to include with the prompt.

Initialization text: The text that you want to add. Specify a task description or instructions similar to what you use for zero-shot prompting.

Learn more

Parent topic: Tuning a model

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more