Evaluating the results of a tuning experiment

Tuning a foundation model in watsonx.ai is an iterative process. You run a tuning experiment and then evaluate the results. If necessary, you change experiment variables and rerun the experiment repeatedly until you are satisfied with the output from the tuned foundation model.

Check your progress after each experiment run. Find any limitations in your tuning experiment configuration and address them before you assess your training data for potential problems.

A sample Python notebook named Use watsonx.aito tune IBM granite-13b-instruct-v2 model with Car Rental Company customer satisfaction document is available that contains code for prompt-tuning foundation models in watsonx.ai. The sample notebook has sections for optimizing the experiment parameters and for inferencing the tuned model. For more information, see Tuning a foundation model by using a Python notebook.

Workflow for improving tuning experiment results

There is no one right set of tuning parameters or training data examples to use. The best tuning parameter settings and data set sizes vary based on your data, the foundation model you use, and the type of task you want the model to do. Follow these steps to save time and stay on track as you experiment.

You can use the Tuning Studio to complete these steps or use the sample notebook to do them programmatically.

Before you begin your experimentation, create or preserve a subset of tuning training data to use as a test data set.
Run a tuning experiment with the default tuning parameters.
Check the loss function for the experiment run.

The tuned model is performing well when your loss function has a downward-sloping curve that levels off near zero.
If necessary, adjust parameter values and rerun the experiment until the loss function levels off to near zero. For more information, see Adjusting tuning parameters.
Test the quality of the tuned model by submitting prompts from the test data set.

You can inference the tuned foundation model from the Prompt Lab or programmatically by using the sample notebook. For more information, see Using the notebook to evaluate the tuned model.
If necessary, revise or augment the training data. For more information, see Addressing data quality problems in tuned model output.

When new data is introduced, more tuning parameter optimizations might be possible. Rerun the experiment, and then repeat the steps in this workflow starting from Step 3.

Adjusting tuning parameters

When a tuning experiment run is finished, a loss function graph is displayed. A loss function measures the difference between predicted and actual results with each training run. A successful tuning experiment results in a loss function that has a downward-sloping curve.

Where the measure of loss drops and levels off is called the convergence. You want the curve to drop, or converge, and the tail end of the curve to reach as close as possible to 0 because it means that the predicted results are as similar as possible to results from the training data.

If the loss function for your experiment resembles a mountain range with multiple peaks, the loss never converges, or the loss converges but remains at a number much higher than zero, adjust your tuning parameters.

You can configure the parameter values in the Tuning Studio or use the sample notebook. The sample notebook has steps that help you find the best values to use for your tuning parameters, which is sometimes called hyperparameter optimization. For more information, see Using the notebook to optimize tuning parameter values.

The following table describes common tuning experiment outcomes and lists actions that might improve the outcomes.

Table 1: Actions for addressing common tuning experiment flaws
Loss function graph	Cause	Actions to try
Loss curve is flat and never drops.	Tuning is not improving the results by much.	• Increase the learning rate (by 10x) so that the experiment makes more drastic adjustments to the prompt vector.
Loss curve drops but the tail settles at too-high a number.	Tuning is not improving the results by as much as it could.	• Increase the learning rate (by 5x) so that the experiment makes bigger adjustments to the prompt vector.
Loss curve drops, then decreases steadily but never levels off.	Training ended before the model was fully tuned.	• Increase the number of epochs to give the model more time to learn.
Loss curve goes up and then drops, but never gets low enough.	Training is unstable because the high learning rate is causing the prompt vector to change too much.	Decrease the learning rate (by 10x) to make smaller adjustments to the prompt vector.

For more information about how to change tuning parameters and rerun a tuning experiment, see Tuning a foundation model.

Addressing data quality problems in tuned model output

You know that you're done tuning a model when you can submit zero-shot prompts to the tuned model and get back outputs you expect.

The following table describes some common training data quality issues and lists actions that you can take to address them.

Table 2: Actions for addressing training data flaws
Outcome	Cause	Actions to try
Tuned model outputs don't match the content and format of output examples from training data	Not enough training data examples	Increase the training data size.
Tuned model outputs are incomplete	The tuning process isn't using the examples that you think it's using	Watch the length of your training data input and output examples. The maximum input tokens allowed is 256 and the maximum output tokens allowed is 128. Examples that are longer than the maximum allowed length are truncated.
Missing classification labels in a classification task	Not enough examples of each class type for a classification task	Add more examples of each class type that you want the model to recognize.
Missing text extractions in an extraction task	Not enough examples of each entity type for an extraction task	Add more examples of each entity type that you want the model to recognize.
Inaccurate class labels or entity type text extractions	Insufficient context to choose the correct class or entity type	• Add an equal number of examples for each type. • Review the classes or entities that you want the model to identify or extract to make sure that they are distinct from one another.

Parent topic: Tuning Studio