Take this tutorial to learn how to evaluate and track a prompt template. You can evaluate prompt templates in projects or deployment spaces to measure the performance of foundation model tasks and understand how your model generates responses.
Then, you can track the prompt template in an AI use case to capture and share facts about the asset to help you meet governance and compliance goals.
Required services
watsonx.governance
Your basic workflow includes these tasks:
Open a project that contains the prompt template to evaluate. Projects are where you can collaborate with others to work with assets.
Evaluate a prompt template using test data.
Review the results on the AI Factsheet.
Track the evaluated prompt template in an AI use case.
Deploy and test your evaluated prompt template.
Read about prompt templates
Copy link to section
With watsonx.governance, you can evaluate prompt templates in projects to measure how effectively your foundation models generate responses for the following task types:
Watch a video about evaluating and tracking a prompt template
Copy link to section
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The
video is intended to be a companion to the written tutorial.
This video provides a visual method to learn the concepts and tasks in this documentation.
Try a tutorial about evaluating and tracking a prompt template
Tips for completing this tutorial Here are some tips for successfully completing this tutorial.
Use the video picture-in-picture
Copy link to section
Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture
mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.
The following animated image shows how to use the video picture-in-picture and table of contents features:
For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser
windows side-by-side to make it easier to follow along.
Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.
Complete the prerequisites
To complete this tutorial, you must set up the following prerequisites.
Assign access to the Platform assets catalog
Copy link to section
You must have at least Editor access to the Platform assets catalog where AI use cases and inventories are stored. Refer to the Adding platform connections topic for more
information.
Watch the following animated image to see how to create the catalog and assign access.
Set up Watson OpenScale
Copy link to section
This tutorial requires Watson OpenScale. Follow these steps to set up Watson OpenScale using the Auto setup option or refer to Setup options for Watson OpenScale to see other setup options:
From the Navigation Menu, choose Administration > Services > Service instances.
On the Service instances page, for your Watson OpenScale or watsonx.governance instance, click the Overflow menu , and choose Launch.
On the Service details page, click Launch Watson OpenScale.
When the Model evaluation page displays, click Auto setup.
Task 1: Create the workspaces
To complete this tutorial, you need three workspaces:
Develop phase: A development project to store the assets that you develop, evaluate, and track.
Validate phase: A validation project to store the assets that are ready to be validated.
Operate phase: A production deployment space to store the validated assets and deployments.
Task 1a: Create the development project based on a sample
Copy link to section
To preview this task, watch the video beginning at 00:11.
The Resource hub includes a sample project that contains sample prompt templates that you can evaluate and track. If you have already created the sample project, then skip step 1 in this task, and next associate the watsonx.ai Runtime
service with the sample project. Otherwise, follow these steps to create the development project based on a sample:
The following image shows the development project Assets tab. You are now ready to create the inventory and AI use case.
Task 1b: Create a validation project
Copy link to section
To preview this task, watch the video beginning at 00:44.
Typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation
data occurs in a different project. Follow these steps to create an empty project. Later, you import assets from the development project into the validation project.
From the Navigation Menu, choose Projects > View all projects.
On the Projects page, click New project.
For the project name, type: Validation projectCopied to clipboard
Click Create.
Follow the same steps as in Task 1a to associate your watsonx.ai Runtime service with the validation project.
Click the Assets tab to see the empty project.
Check your progress
Copy link to section
The following image shows the empty validation project.
Task 1c: Create a deployment space
Copy link to section
To preview this task, watch the video beginning at 01:16.
You need to create a deployment space now, so you can later promote the prompt template to that deployment space. Follow these steps to create the deployment space:
From the Navigation Menu, choose Deployments.
Click New deployment space.
For the Space name, copy and paste the following text: Insurance claims deployment spaceCopied to clipboard
For the Deployment stage, select Production.
Important: You must select Production for the Deployment stage if you wish to move the deployment from the Evaluation stage to the Operation stage.
Select your machine learning service from the list.
Click Create.
When the space is created, click View new space.
Check your progress
Copy link to section
The following image shows the deployment space.
Task 2: Create an inventory and AI use case
An inventory is for storing and reviewing AI use cases. AI use cases collect governance facts for AI assets that your organization tracks. You can view all the AI use cases in an inventory. You must have a Platform assets catalog to create
an inventory. Refer to the Complete the prerequisites section.
Task 2a: Create an inventory
Copy link to section
To preview this task, watch the video beginning at 01:45.
Follow these steps to create an inventory:
From the Navigation Menu, choose AI governance > Inventories.
If you have existing inventory, then skip to Create a new AI use case to use that inventory; otherwise, continue with this task to create an inventory.
On the Inventories page, click New inventory.
For the name, copy and paste the following text: Golden Bank Insurance InventoryCopied to clipboard
For the description, copy and paste the following text: Inventory for insurance related claims processingCopied to clipboard
Clear the Add collaborators after creation option. You can restrict access at the inventory and AI use case level.
Select your Cloud Object Storage instance from the list.
Click Create.
Check your progress
Copy link to section
The following image shows the inventory. You are now ready to create an AI use case.
Task 2b: Create an AI use case
Copy link to section
To preview this task, watch the video beginning at 02:08.
An AI use case is a defined business problem that you can solve with the help of AI. Usually these are defined before any AI asset gets developed. Follow these steps to create an AI use case:
From the Navigation Menu, choose AI governance > AI use cases. If
prompted, click Complete setup. You will see this option if this is your first time working with AI use cases.
Click New AI use case.
For the Name, copy and paste the following text: Insurance claims processing AI use caseCopied to clipboard
Select Golden Bank Insurance Inventory or other existing inventory.
Click Create to accept the default values for the rest of the fields.
If this is your first time using AI use cases, then you are prompted to set up the feature. Click Begin, and wait for the AI use case to display.
Check your progress
Copy link to section
The following image shows the AI use case.
Task 2c: Associate the workspaces with the use case
Copy link to section
To preview this task, watch the video beginning at 02:29.
Follow these steps to associate the workspaces with this use case:
Note: You can create new projects and deployment spaces from within the AI use case.
Scroll to the Associated workspaces section.
Under the Develop phase, click Associate workspace.
Select the Getting started with watsonx.governance project.
Click Save.
Under the Validate phase, click Associate workspace.
Select the Validation project.
Click Save.
Under the Operate phase, click Associate workspace.
Select Insurance claims deployment space.
Click Save.
Check your progress
Copy link to section
The following image shows the AI use case with all associated workspaces.
Task 3: Evaluate the sample prompt template
The sample project contains a few prompt templates and CSV files used as test data. Complete these tasks to evaluate one of the sample prompt templates.
Task 3a: Edit the sample prompt template in the Prompt Lab
Copy link to section
To preview this task, watch the video beginning at 03:02.
Follow these steps to view the prompt template to see how it is structured:
From the Navigation Menu, choose Projects > View all projects.
Select the Getting started with watsonx.governance project.
Click the Assets tab.
Click Insurance claim summarization to open the prompt template in Prompt Lab, and then click Edit.
Click the Prompt variables icon .
Note: To run evaluations, you must create at least one prompt variable.
Scroll to the Try section. Notice the {input} variable in the Input field. You must include the prompt variable as input for testing your prompt. A prompt variable is a placeholder keyword that you include
in the static text of your prompt at creation time and replace with text dynamically at run time.
Check your progress
Copy link to section
The following image shows the Prompt Lab.
Task 3b: Evaluate the prompt template
Copy link to section
To preview this task, watch the video beginning at 03:24.
Now you are ready to evaluate the prompt template.
Click the Evaluate icon .
Expand the Generative AI Quality section to see a list of dimensions. The available metrics depend on the task type of the prompt. For example, summarization has different metrics than classification.
Click Next.
Select the test data:
Click Select from project.
Select Project file > Insurance claim summarization test data.csv.
Click Select.
For the Input column, select Insurance_Claim.
For the Reference output column, select Summary.
Click Next.
Click Evaluate. Evaluations can take a few minutes to complete. When the evaluation completes, you see the test results on the Evaluate tab. This page shows detailed information about this evaluation run so
you can gain insights about your model performance. The summary provides an overview of metric scores and violations of default score thresholds for your prompt template evaluations.
Click the AI Factsheet tab.
View the information on each of the sections on the tab.
Click Development > Getting started with watsonx.governance > Test results to see the test results again.
Check your progress
Copy link to section
The following image shows the results of the evaluation. Now you can start tracking the prompt template in an AI use case.
Task 4: Start tracking the prompt template
To preview this task, watch the video beginning at 04:24.
You can track your prompt template in an AI use case to report the development and test process to your peers. Follow these steps to start tracking the prompt template:
On the AI Factsheet tab, click the Governance page.
Click Track in AI use case.
Notice that the associated AI use case is Insurance claims processing AI use case.
Select an approach. An approach is one facet of the solution to the business problem represented by the AI use case. For example, you might create approaches to track several prompt templates in a use case.
Click Next.
For the model version, select Experimental.
Accept the default value for the version number.
Click Next.
Review the information, and then click Track asset.
When model tracking successfully begins, click the View details icon to open the AI use case.
Click the Lifecycle tab to see the prompt template is in the Development phase. As the prompt template moves through the AI lifecycle, it will move through these phases:
Development phase: AI assets that have been developed in a project environment.
Validation phase: AI assets that have been deployed in a space or project for validation.
Operation phase: AI assets deployed in a space for operation.
Check your progress
Copy link to section
The following image shows the Lifecycle tab in the AI use case with the prompt template in the Development phase. You are now ready to continue to the Validation phase.
Task 5: Import the tracked assets for validation
As noted in Task 1, typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In
this case, validation data occurs in a different project. Follow these steps to export the development project and import those assets into the validation project that you created in Task 1 to move the asset into the Validation phase of the AI lifecycle:
Task 5a: Export the sample project
Copy link to section
To preview this task, watch the video beginning at 05:07.
Follow these steps to export the development project:
From the Navigation Menu, choose Projects > View all projects.
Select the Getting started with watsonx.governance project.
Click the Import/Export icon > Export project.
Check the box to select all assets.
Click Export.
Click Continue export to acknowledge that the assets might contain credentials.
Wait to be prompted for the project file name, and type validation-project.zipCopied to clipboard, and then click Save.
When the project export completes, click Back to project.
Check your progress
Copy link to section
The following image shows the Export project page.
Task 5b: Import the assets into the validation project
Copy link to section
To preview this task, watch the video beginning at 05:28.
Follow these steps to import the assets from the development project into the validation project:
From the Navigation Menu, choose Projects > View all projects.
Open the Validation project.
Click the Import/Export icon > Import project.
Click Browse.
Select the validation-project.zip, and click Open.
Select the option to indicate agreement: I understand that some types of assets overwrite existing asets with the same name and type.
Click Import.
When the assets import successfully, click the Refresh icon to see the imported assets.
Check your progress
Copy link to section
The following image shows the validation project Assets tab. You are now ready to evaluate the sample prompt template in the validation project.
Task 6: Validate the prompt template
To preview this task, watch the video beginning at 05:41.
Now you are ready to evaluate the prompt template in this validation project using the same evaluation process as before. Use the same test data set for evaluation. And select the same Input and Output columns as before. Follow these steps
to validate the prompt template:
Click the Assets tab in the Validation project.
From the Overflow menu for the Insurance claim summarization prompt template,
select Evaluate.
Click Evaluate to start the evaluation.
Repeat the steps in Task 3a: Evaluate the prompt template to evaluate the Claims processing summarization prompt template in the Validation project.
Click the AI Factsheet tab when the evaluation is complete.
View both sets of test results:
Click Development > Getting started with watsonx.governance > Test results.
Click Validation > Validation project > Test results.
Check your progress
Copy link to section
The following image shows the validation test results. You are now ready to promote the prompt template to a deployment space, and then deploy the prompt template.
Task 7: Deploy the prompt template
To deploy the prompt template, you need promote it the deployment space that you created in Task 1. Then, in the deployment space, you can create a deployment and test the deployed prompt template.
Task 7a: Promote the prompt template to a deployment space
Copy link to section
To preview this task, watch the video beginning at 06:14.
You promote the prompt template to a deployment space in preparation for deploying it. Follow these steps to prompte the prompt template:
Click Validation project in the navigation trail.
From the Overflow menu for the Insurance claim summarization prompt template,
select Promote to space.
For the Target space, select Insurance claims deployment space.
Check the option to Go to the space after promoting the prompt template.
Click Promote.
Check your progress
Copy link to section
The following image shows the prompt template in the deployment space. You are now ready to create a deployment.
Task 7b: Deploy the prompt template
Copy link to section
To preview this task, watch the video beginning at 06:33.
Now you can create an online deployment of the prompt template from inside the deployment space. Follow these steps to create a deployment:
From the Insurance claims summarization asset page in the deployment space, select New deployment.
For the deployment name, copy and paste the following text:
Insurance claims summarization deployment
Copy to clipboardCopied to clipboard
Click Create.
Check your progress
Copy link to section
The following image shows the deployed prompt template.
Task 7c: View the deployed prompt template
Copy link to section
To preview this task, watch the video beginning at 06:47.
Follow these steps to view the deployed prompt template in its current phase of the lifecycle:
View the deployment when it is ready. The API reference tab provides information for you to use the prompt template deployment in your application.
Click the Test tab. The Test tab allows you to submit an instruction and Input to test the deployment.
Click Generate. Close the results window.
Click the AI Factsheet tab.
Scroll down to the bottom of the AI Factsheet page, and click the arrow for more details.
Review the information in the Development, Validation, and Operation phases for the AI Factsheet for the deployed prompt template.
Scroll to the top of the page, and click the View details icon to open the AI use case.
In the use case, click the Lifecycle tab. You can see that the prompt template is now in the Operation phase.
Click the Insurance claim summarization prompt template in the Operation phase. When you are done, click Cancel.
Click the Insurance claims summarization deployment prompt template deployment in the Operation phase. When you are done, click Cancel.
Check your progress
Copy link to section
The following image shows the prompt template prompt template in the Operation phase of the lifecycle.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.