Quick start: Evaluate and track a prompt template
Take this tutorial to learn how to evaluate and track a prompt template. You can evaluate prompt templates in projects or deployment spaces to measure the performance of foundation model tasks and understand how your model generates responses. Then, you can track the prompt template in an AI use case to capture and share facts about the asset to help you meet governance and compliance goals.
- Required services
- watsonx.governance
Your basic workflow includes these tasks:
- Open a project that contains the prompt template to evaluate. Projects are where you can collaborate with others to work with assets.
- Evaluate a prompt template using test data.
- Review the results on the AI Factsheet.
- Track the evaluated prompt template in an AI use case.
- Deploy and test your evaluated prompt template.
Read about prompt templates
With watsonx.governance, you can evaluate prompt templates in projects to measure how effectively your foundation models generate responses for the following task types:
- Classification
- Summarization
- Generation
- Question answering
- Entity extraction
Read more about evaluating prompt templates in projects
Read more about evaluating prompt templates in deployment spaces
Watch a video about evaluating and tracking a prompt template
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.
This video provides a visual method to learn the concepts and tasks in this documentation.
Try a tutorial about evaluating and tracking a prompt template
In this tutorial, you will complete these tasks:
- Task 1: Create the workspaces
- Task 2: Create an inventory and AI use case
- Task 3: Evaluate the sample prompt template
- Task 4: Start tracking the prompt template
- Task 5: Import the tracked assets for validation
- Task 6: Validate the prompt template
- Task 7: Deploy the prompt template
Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.
Use the video picture-in-picture
The following animated image shows how to use the video picture-in-picture and table of contents features:
Get help in the community
If you need help with this tutorial, you can ask a question or find an answer in the watsonx Community discussion forum.
Set up your browser windows
For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.
Complete the prerequisites
To complete this tutorial, you must set up Watson OpenScale. Follow these steps to set up Watson OpenScale using the Auto setup option or refer to Setup options for Watson OpenScale to see other setup options:
- From the Navigation Menu , choose Administration > Services > Service instances.
- On the Service instances page, for your Watson OpenScale or watsonx.governance instance, click the Overflow menu , and choose Launch.
- On the Service details page, click Launch Watson OpenScale.
- When the Model evaluation page displays, click Auto setup.
Task 1: Create the workspaces
To complete this tutorial, you need three workspaces:
- Develop phase: A development project to store the assets that you develop, evaluate, and track.
- Validate phase: A validation project to store the assets that are ready to be validated.
- Operate phase: A production deployment space to store the validated assets and deployments.
Task 1a: Create the development project based on a sample
To preview this task, watch the video beginning at 00:11.
The Resource hub includes a sample project that contains sample prompt templates that you can evaluate and track. If you have already created the sample project, then skip step 1 in this task, and next associate the Watson Machine Learning service with the sample project. Otherwise, follow these steps to create the development project based on a sample:
-
Access the Getting started with watsonx.governance project in the Resource hub.
-
Click Create project.
-
Accept the default values for the project name, and click Create.
-
Click View new project when the project is successfully created.
-
-
Associate a Watson Machine Learning service with the project. For more information, see Watson Machine Learning.
-
When the project opens, click the Manage tab, and select the Services and integrations page.
-
On the IBM services tab, click Associate service.
-
Select your Watson Machine Learning instance. If you don't have a Watson Machine Learning service instance provisioned yet, follow these steps:
-
Click New service.
-
Select Watson Machine Learning.
-
Click Create.
-
Select the new service instance from the list.
-
-
Click Associate service.
-
If necessary, click Cancel to return to the Services & Integrations page.
-
-
Click the Assets tab in the project to see the sample assets.
For more information or to watch a video, see Creating a project.
For more information on associated services, see Adding associated services.
Check your progress
The following image shows the development project Assets tab. You are now ready to create the inventory and AI use case.
Task 1b: Create a validation project
To preview this task, watch the video beginning at 00:44.
Typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation data occurs in a different project. Follow these steps to create an empty project. Later, you import assets from the development project into the validation project.
- From the Navigation Menu , choose Projects > View all projects.
- On the Projects page, click New project.
- For the project name, type:
Validation project
- Click Create.
- Follow the same steps as in Task 1a to associate your Watson Machine Learning service with the validation project.
- Click the Assets tab to see the empty project.
Check your progress
The following image shows the empty validation project.
Task 1c: Create a deployment space
To preview this task, watch the video beginning at 01:16.
You need to create a deployment space now, so you can later promote the prompt template to that deployment space. Follow these steps to create the deployment space:
-
From the Navigation Menu , choose Deployments.
-
Click New deployment space.
-
For the Space name, copy and paste the following text:
Insurance claims deployment space
-
For the Deployment stage, select Production.
Important: You must select Production for the Deployment stage if you wish to move the deployment from the Evaluation stage to the Operation stage. -
Select your machine learning service from the list.
-
Click Create.
-
When the space is created, click View new space.
Check your progress
The following image shows the deployment space.
Task 2: Create an inventory and AI use case
An inventory is for storing and reviewing AI use cases. AI use cases collect governance facts for AI assets that your organization tracks. You can view all the AI use cases in an inventory.
Task 2a: Create an inventory
To preview this task, watch the video beginning at 01:45.
Follow these steps to create an inventory:
-
From the Navigation Menu , choose AI governance > AI use cases.
-
If you have existing inventory, then you can skip to Create a new AI use case to use that inventory. If you don't have any inventories, then follow these steps.
-
Click the Manage inventories icon .
-
On the Inventories page, click New inventory.
-
For the name, copy and paste the following text:
Golden Bank Insurance Inventory
-
For the description, copy and paste the following text:
Inventory for insurance related claims processing
-
Clear the Add collaborators after creation option. You can restrict access at the inventory and AI use case level.
-
Select your Cloud Object Storage instance from the list.
-
Click Create.
-
-
Close the Manage inventories page.
Check your progress
The following image shows the inventory. You are now ready to create an AI use case.
Task 2b: Create an AI use case
To preview this task, watch the video beginning at 02:08.
An AI use case is a defined business problem that you can solve with the help of AI. Usually these are defined before any AI asset gets developed. Follow these steps to create an AI use case:
- Click New AI use case.
- For the Name, copy and paste the following text:
Insurance claims processing AI use case
- Select an existing inventory.
- Click Create to accept the default values for the rest of the fields.
- If this is your first time using AI use cases, then you are prompted to set up the feature. Click Begin, and wait for the AI use case to display.
Check your progress
The following image shows the AI use case.
Task 2c: Associate the workspaces with the use case
To preview this task, watch the video beginning at 02:29.
Follow these steps to associate the workspaces with this use case:
- Scroll to the Associated workspaces section.
- Under the Develop phase, click Associate workspace.
- Select the Getting started with watsonx.governance project.
- Click Save.
- Under the Validate phase, click Associate workspace.
- Select the Validation project.
- Click Save.
- Under the Operate phase, click Associate workspace.
- Select Insurance claims deployment space.
- Click Save.
Check your progress
The following image shows the AI use case with all associated workspaces.
Task 3: Evaluate the sample prompt template
The sample project contains a few prompt templates and CSV files used as test data. Complete these tasks to evaluate one of the sample prompt templates.
Task 3a: Edit the sample prompt template in the Prompt Lab
To preview this task, watch the video beginning at 03:02.
Follow these steps to view the prompt template to see how it is structured:
-
From the Navigation Menu , choose Projects > View all projects.
-
Select the Getting started with watsonx.governance project.
-
Click the Assets tab.
-
Click Insurance claim summarization to open the prompt template in Prompt Lab, and then click Edit.
-
Click the Prompt variables icon .
Note: To run evaluations, you must create at least one prompt variable. -
Scroll to the Try section. Notice the
{input}
variable in the Input field. You must include the prompt variable as input for testing your prompt. A prompt variable is a placeholder keyword that you include in the static text of your prompt at creation time and replace with text dynamically at run time.
Check your progress
The following image shows the Prompt Lab.
Task 3b: Evaluate the prompt template
To preview this task, watch the video beginning at 03:24.
Now you are ready to evaluate the prompt template.
- Click the Evaluate icon .
- Expand the Generative AI Quality section to see a list of dimensions. The available metrics depend on the task type of the prompt. For example, summarization has different metrics than classification.
- Click Next.
- Select the test data:
- Click Select from project.
- Select Project file > Insurance claim summarization test data.csv.
- Click Select.
- For the Input column, select Insurance_Claim.
- For the Reference output column, select Summary.
- Click Next.
- Click Evaluate. Evaluations can take a few minutes to complete. When the evaluation completes, you see the test results on the Evaluate tab. This page shows detailed information about this evaluation run so you can gain insights about your model performance. The summary provides an overview of metric scores and violations of default score thresholds for your prompt template evaluations.
- Click the AI Factsheet tab.
- View the information on each of the sections on the tab.
- Click Development > Getting started with watsonx.governance > Test results to see the test results again.
Check your progress
The following image shows the results of the evaluation. Now you can start tracking the prompt template in an AI use case.
Task 4: Start tracking the prompt template
To preview this task, watch the video beginning at 04:24.
You can track your prompt template in an AI use case to report the development and test process to your peers. Follow these steps to start tracking the prompt template:
- On the AI Factsheet tab, click the Governance page.
- Click Track in AI use case.
- Notice that the associated AI use case is Insurance claims processing AI use case.
- Select an approach. An approach is one facet of the solution to the business problem represented by the AI use case. For example, you might create approaches to track several prompt templates in a use case.
- Click Next.
- For the model version, select Experimental.
- Accept the default value for the version number.
- Click Next.
- Review the information, and then click Track asset.
- When model tracking successfully begins, click the View details icon to open the AI use case.
- Click the Lifecycle tab to see the prompt template is in the Development phase. As the prompt template moves through the AI lifecycle, it will move through these phases:
- Development phase: AI assets that have been developed in a project environment.
- Validation phase: AI assets that have been deployed in a space or project for validation.
- Operation phase: AI assets deployed in a space for operation.
Check your progress
The following image shows the Lifecycle tab in the AI use case with the prompt template in the Development phase. You are now ready to continue to the Validation phase.
Task 5: Import the tracked assets for validation
As noted in Task 1, typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation data occurs in a different project. Follow these steps to export the development project and import those assets into the validation project that you created in Task 1 to move the asset into the Validation phase of the AI lifecycle:
Task 5a: Export the sample project
To preview this task, watch the video beginning at 05:07.
Follow these steps to export the development project:
- From the Navigation Menu , choose Projects > View all projects.
- Select the Getting started with watsonx.governance project.
- Click the Import/Export icon > Export project.
- Check the box to select all assets.
- Click Export.
- Click Continue export to acknowledge that the assets might contain credentials.
- Wait to be prompted for the project file name, and type
validation-project.zip
, and then click Save. - When the project export completes, click Back to project.
Check your progress
The following image shows the Export project page.
Task 5b: Import the assets into the validation project
To preview this task, watch the video beginning at 05:28.
Follow these steps to import the assets from the development project into the validation project:
- From the Navigation Menu , choose Projects > View all projects.
- Open the Validation project.
- Click the Import/Export icon > Import project.
- Click Browse.
- Select the validation-project.zip, and click Open.
- Select the option to indicate agreement: I understand that some types of assets overwrite existing asets with the same name and type.
- Click Import.
- When the assets import successfully, click the Refresh icon to see the imported assets.
Check your progress
The following image shows the validation project Assets tab. You are now ready to evaluate the sample prompt template in the validation project.
Task 6: Validate the prompt template
To preview this task, watch the video beginning at 05:41.
Now you are ready to evaluate the prompt template in this validation project using the same evaluation process as before. Use the same test data set for evaluation. And select the same Input and Output columns as before. Follow these steps to validate the prompt template:
- Click the Assets tab in the Validation project.
- From the Overflow menu for the Insurance claim summarization prompt template, select Evaluate.
- Click Evaluate to start the evaluation.
- Repeat the steps in Task 3a: Evaluate the prompt template to evaluate the Claims processing summarization prompt template in the Validation project.
- Click the AI Factsheet tab when the evaluation is complete.
- View both sets of test results:
- Click Development > Getting started with watsonx.governance > Test results.
- Click Validation > Validation project > Test results.
Check your progress
The following image shows the validation test results. You are now ready to promote the prompt template to a deployment space, and then deploy the prompt template.
Task 7: Deploy the prompt template
To deploy the prompt template, you need promote it the deployment space that you created in Task 1. Then, in the deployment space, you can create a deployment and test the deployed prompt template.
Task 7a: Promote the prompt template to a deployment space
To preview this task, watch the video beginning at 06:14.
You promote the prompt template to a deployment space in preparation for deploying it. Follow these steps to prompte the prompt template:
-
Click Validation project in the navigation trail.
-
From the Overflow menu for the Insurance claim summarization prompt template, select Promote to space.
-
For the Target space, select Insurance claims deployment space.
-
Check the option to Go to the space after promoting the prompt template.
-
Click Promote.
Check your progress
The following image shows the prompt template in the deployment space. You are now ready to create a deployment.
Task 7b: Deploy the prompt template
To preview this task, watch the video beginning at 06:33.
Now you can create an online deployment of the prompt template from inside the deployment space. Follow these steps to create a deployment:
-
From the Insurance claims summarization asset page in the deployment space, select New deployment.
-
For the deployment name, copy and paste the following text:
Insurance claims summarization deployment
-
Click Create.
Check your progress
The following image shows the deployed prompt template.
Task 7c: View the deployed prompt template
To preview this task, watch the video beginning at 06:47.
Follow these steps to view the deployed prompt template in its current phase of the lifecycle:
- View the deployment when it is ready. The API reference tab provides information for you to use the prompt template deployment in your application.
- Click the Test tab. The Test tab allows you to submit an instruction and Input to test the deployment.
- Click Generate. Close the results window.
- Click the AI Factsheet tab.
- Scroll down to the bottom of the AI Factsheet page, and click the arrow for more details.
- Review the information in the Development, Validation, and Operation phases for the AI Factsheet for the deployed prompt template.
- Scroll to the top of the page, and click the View details icon to open the AI use case.
- In the use case, click the Lifecycle tab. You can see that the prompt template is now in the Operation phase.
- Click the Insurance claim summarization prompt template in the Operation phase. When you are done, click Cancel.
- Click the Insurance claims summarization deployment prompt template deployment in the Operation phase. When you are done, click Cancel.
Check your progress
The following image shows the prompt template prompt template in the Operation phase of the lifecycle.
Next steps
Try one of the other tutorials:
Additional resources
-
View more videos.
-
Find sample data sets, projects, models, prompts, and notebooks in the Resource hub to gain hands-on experience:
Notebooks that you can add to your project to get started analyzing data and building models.
Projects that you can import containing notebooks, data sets, prompts, and other assets.
Data sets that you can add to your project to refine, analyze, and build models.
Prompts that you can use in the Prompt Lab to prompt a foundation model.
Foundation models that you can use in the Prompt Lab.
Parent topic: Quick start tutorials