0 / 0
Quick start: Evaluate and track a prompt template
Last updated: Dec 12, 2024
Quick start: Evaluate and track a prompt template

Take this tutorial to learn how to evaluate and track a prompt template. You can evaluate prompt templates in projects or deployment spaces to measure the performance of foundation model tasks and understand how your model generates responses. Then, you can track the prompt template in an AI use case to capture and share facts about the asset to help you meet governance and compliance goals.

Required services
watsonx.governance

Your basic workflow includes these tasks:

  1. Open a project that contains the prompt template to evaluate. Projects are where you can collaborate with others to work with assets.
  2. Evaluate a prompt template using test data.
  3. Review the results on the AI Factsheet.
  4. Track the evaluated prompt template in an AI use case.
  5. Deploy and test your evaluated prompt template.

Read about prompt templates

With watsonx.governance, you can evaluate prompt templates in projects to measure how effectively your foundation models generate responses for the following task types:

  • Classification
  • Summarization
  • Generation
  • Question answering
  • Entity extraction

Read more about evaluating prompt templates in projects

Read more about evaluating prompt templates in deployment spaces

Watch a video about evaluating and tracking a prompt template

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.


Try a tutorial about evaluating and tracking a prompt template

In this tutorial, you will complete these tasks:





Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the watsonx Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.



Complete the prerequisites

To complete this tutorial, you must set up the following prerequisites.

Assign access to the Platform assets catalog

You must have at least Editor access to the Platform assets catalog where AI use cases and inventories are stored. Refer to the Adding platform connections topic for more information.

Watch the following animated image to see how to create the catalog and assign access.

Create the Platform access catalog animated image

Set up Watson OpenScale

This tutorial requires Watson OpenScale. Follow these steps to set up Watson OpenScale using the Auto setup option or refer to Setup options for Watson OpenScale to see other setup options:

  1. From the Navigation Menu Navigation menu, choose Administration > Services > Service instances.
  2. On the Service instances page, for your Watson OpenScale or watsonx.governance instance, click the Overflow menu Overflow menu, and choose Launch.
  3. On the Service details page, click Launch Watson OpenScale.
  4. When the Model evaluation page displays, click Auto setup.



Task 1: Create the workspaces

To complete this tutorial, you need three workspaces:

  • Develop phase: A development project to store the assets that you develop, evaluate, and track.
  • Validate phase: A validation project to store the assets that are ready to be validated.
  • Operate phase: A production deployment space to store the validated assets and deployments.

Task 1a: Create the development project based on a sample

preview tutorial video To preview this task, watch the video beginning at 00:11.

The Resource hub includes a sample project that contains sample prompt templates that you can evaluate and track. If you have already created the sample project, then skip step 1 in this task, and next associate the watsonx.ai Runtime service with the sample project. Otherwise, follow these steps to create the development project based on a sample:

  1. Access the Getting started with watsonx.governance project in the Resource hub.

    1. Click Create project.

    2. Accept the default values for the project name, and click Create.

    3. Click View new project when the project is successfully created.

  2. Associate a watsonx.ai Runtime service with the project. For more information, see watsonx.ai Runtime.

    1. When the project opens, click the Manage tab, and select the Services and integrations page.

    2. On the IBM services tab, click Associate service.

    3. Select your watsonx.ai Runtime instance. If you don't have a watsonx.ai Runtime service instance provisioned yet, follow these steps:

      1. Click New service.

      2. Select watsonx.ai Runtime.

      3. Click Create.

      4. Select the new service instance from the list.

    4. Click Associate service.

    5. If necessary, click Cancel to return to the Services & Integrations page.

  3. Click the Assets tab in the project to see the sample assets.

For more information or to watch a video, see Creating a project.

For more information on associated services, see Adding associated services.

Checkpoint icon Check your progress

The following image shows the development project Assets tab. You are now ready to create the inventory and AI use case.

Sample project assets

Task 1b: Create a validation project

preview tutorial video To preview this task, watch the video beginning at 00:44.

Typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation data occurs in a different project. Follow these steps to create an empty project. Later, you import assets from the development project into the validation project.

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.
  2. On the Projects page, click New project.
  3. For the project name, type: Validation project
  4. Click Create.
  5. Follow the same steps as in Task 1a to associate your watsonx.ai Runtime service with the validation project.
  6. Click the Assets tab to see the empty project.

Checkpoint icon Check your progress

The following image shows the empty validation project.

Validation project

Task 1c: Create a deployment space

preview tutorial video To preview this task, watch the video beginning at 01:16.

You need to create a deployment space now, so you can later promote the prompt template to that deployment space. Follow these steps to create the deployment space:

  1. From the Navigation Menu Navigation menu, choose Deployments.

  2. Click New deployment space.

  3. For the Space name, copy and paste the following text: Insurance claims deployment space

  4. For the Deployment stage, select Production.

    Important: You must select Production for the Deployment stage if you wish to move the deployment from the Evaluation stage to the Operation stage.
  5. Select your machine learning service from the list.

  6. Click Create.

  7. When the space is created, click View new space.

Checkpoint icon Check your progress

The following image shows the deployment space.

Deployment space




Task 2: Create an inventory and AI use case

An inventory is for storing and reviewing AI use cases. AI use cases collect governance facts for AI assets that your organization tracks. You can view all the AI use cases in an inventory. You must have a Platform assets catalog to create an inventory. Refer to the Complete the prerequisites section.

Task 2a: Create an inventory

preview tutorial video To preview this task, watch the video beginning at 01:45.

Follow these steps to create an inventory:

  1. From the Navigation Menu Navigation menu, choose AI governance > AI use cases.

  2. If you have existing inventory, then skip to Create a new AI use case to use that inventory.

  3. If prompted, click Complete setup. You will see this option if this is your first time working with AI use cases. Then follow these steps to create an inventory:

    1. Click the Manage inventories icon Manage inventories.

    2. On the Inventories page, click New inventory.

    3. For the name, copy and paste the following text: Golden Bank Insurance Inventory

    4. For the description, copy and paste the following text: Inventory for insurance related claims processing

    5. Clear the Add collaborators after creation option. You can restrict access at the inventory and AI use case level.

    6. Select your Cloud Object Storage instance from the list.

    7. Click Create.

  4. Close the Manage inventories page.

Checkpoint icon Check your progress

The following image shows the inventory. You are now ready to create an AI use case.

Inventory

Task 2b: Create an AI use case

preview tutorial video To preview this task, watch the video beginning at 02:08.

An AI use case is a defined business problem that you can solve with the help of AI. Usually these are defined before any AI asset gets developed. Follow these steps to create an AI use case:

  1. Click New AI use case.
  2. For the Name, copy and paste the following text: Insurance claims processing AI use case
  3. Select Golden Bank Insurance Inventory or other existing inventory.
  4. Click Create to accept the default values for the rest of the fields.
  5. If this is your first time using AI use cases, then you are prompted to set up the feature. Click Begin, and wait for the AI use case to display.

Checkpoint icon Check your progress

The following image shows the AI use case.

AI use case

Task 2c: Associate the workspaces with the use case

preview tutorial video To preview this task, watch the video beginning at 02:29.

Follow these steps to associate the workspaces with this use case:

Note: You can create new projects and deployment spaces from within the AI use case.
  1. Scroll to the Associated workspaces section.
  2. Under the Develop phase, click Associate workspace.
    1. Select the Getting started with watsonx.governance project.
    2. Click Save.
  3. Under the Validate phase, click Associate workspace.
    1. Select the Validation project.
    2. Click Save.
  4. Under the Operate phase, click Associate workspace.
    1. Select Insurance claims deployment space.
    2. Click Save.

Checkpoint icon Check your progress

The following image shows the AI use case with all associated workspaces.

AI use case




Task 3: Evaluate the sample prompt template

The sample project contains a few prompt templates and CSV files used as test data. Complete these tasks to evaluate one of the sample prompt templates.

Task 3a: Edit the sample prompt template in the Prompt Lab

preview tutorial video To preview this task, watch the video beginning at 03:02.

Follow these steps to view the prompt template to see how it is structured:

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.

  2. Select the Getting started with watsonx.governance project.

  3. Click the Assets tab.

  4. Click Insurance claim summarization to open the prompt template in Prompt Lab, and then click Edit.

  5. Click the Prompt variables icon Prompt variables.

    Note: To run evaluations, you must create at least one prompt variable.
  6. Scroll to the Try section. Notice the {input} variable in the Input field. You must include the prompt variable as input for testing your prompt. A prompt variable is a placeholder keyword that you include in the static text of your prompt at creation time and replace with text dynamically at run time.

Checkpoint icon Check your progress

The following image shows the Prompt Lab.

The following image shows the Prompt Lab.

Task 3b: Evaluate the prompt template

preview tutorial video To preview this task, watch the video beginning at 03:24.

Now you are ready to evaluate the prompt template.

  1. Click the Evaluate icon Evaluate.
  2. Expand the Generative AI Quality section to see a list of dimensions. The available metrics depend on the task type of the prompt. For example, summarization has different metrics than classification.
  3. Click Next.
  4. Select the test data:
    1. Click Select from project.
    2. Select Project file > Insurance claim summarization test data.csv.
    3. Click Select.
    4. For the Input column, select Insurance_Claim.
    5. For the Reference output column, select Summary.
    6. Click Next.
  5. Click Evaluate. Evaluations can take a few minutes to complete. When the evaluation completes, you see the test results on the Evaluate tab. This page shows detailed information about this evaluation run so you can gain insights about your model performance. The summary provides an overview of metric scores and violations of default score thresholds for your prompt template evaluations.
  6. Click the AI Factsheet tab.
    1. View the information on each of the sections on the tab.
    2. Click Development > Getting started with watsonx.governance > Test results to see the test results again.

Checkpoint icon Check your progress

The following image shows the results of the evaluation. Now you can start tracking the prompt template in an AI use case.

Prompt template evaluation test results




Task 4: Start tracking the prompt template

preview tutorial video To preview this task, watch the video beginning at 04:24.

You can track your prompt template in an AI use case to report the development and test process to your peers. Follow these steps to start tracking the prompt template:

  1. On the AI Factsheet tab, click the Governance page.
  2. Click Track in AI use case.
  3. Notice that the associated AI use case is Insurance claims processing AI use case.
  4. Select an approach. An approach is one facet of the solution to the business problem represented by the AI use case. For example, you might create approaches to track several prompt templates in a use case.
  5. Click Next.
  6. For the model version, select Experimental.
  7. Accept the default value for the version number.
  8. Click Next.
  9. Review the information, and then click Track asset.
  10. When model tracking successfully begins, click the View details icon View details to open the AI use case.
  11. Click the Lifecycle tab to see the prompt template is in the Development phase. As the prompt template moves through the AI lifecycle, it will move through these phases:
    • Development phase: AI assets that have been developed in a project environment.
    • Validation phase: AI assets that have been deployed in a space or project for validation.
    • Operation phase: AI assets deployed in a space for operation.

Checkpoint icon Check your progress

The following image shows the Lifecycle tab in the AI use case with the prompt template in the Development phase. You are now ready to continue to the Validation phase.

The Lifecycle tab in the AI use case




Task 5: Import the tracked assets for validation

As noted in Task 1, typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation data occurs in a different project. Follow these steps to export the development project and import those assets into the validation project that you created in Task 1 to move the asset into the Validation phase of the AI lifecycle:

Task 5a: Export the sample project

preview tutorial video To preview this task, watch the video beginning at 05:07.

Follow these steps to export the development project:

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.
  2. Select the Getting started with watsonx.governance project.
  3. Click the Import/Export icon Import/Export > Export project.
  4. Check the box to select all assets.
  5. Click Export.
  6. Click Continue export to acknowledge that the assets might contain credentials.
  7. Wait to be prompted for the project file name, and type validation-project.zip, and then click Save.
  8. When the project export completes, click Back to project.

Checkpoint icon Check your progress

The following image shows the Export project page.

Export project page

Task 5b: Import the assets into the validation project

preview tutorial video To preview this task, watch the video beginning at 05:28.

Follow these steps to import the assets from the development project into the validation project:

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.
  2. Open the Validation project.
  3. Click the Import/Export icon Import/Export > Import project.
  4. Click Browse.
  5. Select the validation-project.zip, and click Open.
  6. Select the option to indicate agreement: I understand that some types of assets overwrite existing asets with the same name and type.
  7. Click Import.
  8. When the assets import successfully, click the Refresh icon Refresh to see the imported assets.

Checkpoint icon Check your progress

The following image shows the validation project Assets tab. You are now ready to evaluate the sample prompt template in the validation project.

Validation project assets




Task 6: Validate the prompt template

preview tutorial video To preview this task, watch the video beginning at 05:41.

Now you are ready to evaluate the prompt template in this validation project using the same evaluation process as before. Use the same test data set for evaluation. And select the same Input and Output columns as before. Follow these steps to validate the prompt template:

  1. Click the Assets tab in the Validation project.
  2. From the Overflow menu overflow menu for the Insurance claim summarization prompt template, select Evaluate.
  3. Click Evaluate to start the evaluation.
  4. Repeat the steps in Task 3a: Evaluate the prompt template to evaluate the Claims processing summarization prompt template in the Validation project.
  5. Click the AI Factsheet tab when the evaluation is complete.
  6. View both sets of test results:
    1. Click Development > Getting started with watsonx.governance > Test results.
    2. Click Validation > Validation project > Test results.

Checkpoint icon Check your progress

The following image shows the validation test results. You are now ready to promote the prompt template to a deployment space, and then deploy the prompt template.

Prompt template evaluation test results




Task 7: Deploy the prompt template

To deploy the prompt template, you need promote it the deployment space that you created in Task 1. Then, in the deployment space, you can create a deployment and test the deployed prompt template.

Task 7a: Promote the prompt template to a deployment space

preview tutorial video To preview this task, watch the video beginning at 06:14.

You promote the prompt template to a deployment space in preparation for deploying it. Follow these steps to prompte the prompt template:

  1. Click Validation project in the navigation trail.

    Validation project navigation trail

  2. From the Overflow menu overflow menu for the Insurance claim summarization prompt template, select Promote to space.

  3. For the Target space, select Insurance claims deployment space.

  4. Check the option to Go to the space after promoting the prompt template.

  5. Click Promote.

Checkpoint icon Check your progress

The following image shows the prompt template in the deployment space. You are now ready to create a deployment.

Prompt template in deployment space

Task 7b: Deploy the prompt template

preview tutorial video To preview this task, watch the video beginning at 06:33.

Now you can create an online deployment of the prompt template from inside the deployment space. Follow these steps to create a deployment:

  1. From the Insurance claims summarization asset page in the deployment space, select New deployment.

  2. For the deployment name, copy and paste the following text:

    Insurance claims summarization deployment
    
  3. Click Create.

Checkpoint icon Check your progress

The following image shows the deployed prompt template.

Deployed prompt template

Task 7c: View the deployed prompt template

preview tutorial video To preview this task, watch the video beginning at 06:47.

Follow these steps to view the deployed prompt template in its current phase of the lifecycle:

  1. View the deployment when it is ready. The API reference tab provides information for you to use the prompt template deployment in your application.
  2. Click the Test tab. The Test tab allows you to submit an instruction and Input to test the deployment.
  3. Click Generate. Close the results window.
  4. Click the AI Factsheet tab.
  5. Scroll down to the bottom of the AI Factsheet page, and click the arrow for more details.
  6. Review the information in the Development, Validation, and Operation phases for the AI Factsheet for the deployed prompt template.
  7. Scroll to the top of the page, and click the View details icon View details to open the AI use case.
  8. In the use case, click the Lifecycle tab. You can see that the prompt template is now in the Operation phase.
  9. Click the Insurance claim summarization prompt template in the Operation phase. When you are done, click Cancel.
  10. Click the Insurance claims summarization deployment prompt template deployment in the Operation phase. When you are done, click Cancel.

Checkpoint icon Check your progress

The following image shows the prompt template prompt template in the Operation phase of the lifecycle.

Prompt template in the Operation phase




Next steps

Try one of the other tutorials:

Additional resources

  • View more videos.

  • Find sample data sets, projects, models, prompts, and notebooks in the Resource hub to gain hands-on experience:

    Notebook Notebooks that you can add to your project to get started analyzing data and building models.

    Project Projects that you can import containing notebooks, data sets, prompts, and other assets.

    Data set Data sets that you can add to your project to refine, analyze, and build models.

    Prompt Prompts that you can use in the Prompt Lab to prompt a foundation model.

    Model Foundation models that you can use in the Prompt Lab.

Parent topic: Quick start tutorials

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more