0 / 0
Evaluating detached deployments in spaces
Last updated: Dec 06, 2024
Evaluating detached deployments in spaces

You can create a detached deployment to evaluate prompt templates for foundation models that are not created or hosted by IBM.

To evaluate external foundation models in watsonx.governance, you can create a detached deployment in a deployment space to connect to your external prompt template. When you evaluate a detached deployment, you can measure how effectively your external model generates responses for the following task types:

  • Text summarization
  • Text classification
  • Question answering
  • Entity extraction
  • Content generation
  • Retrieval augmented generation (RAG)

Before you begin

Required permissions
You must have the Admin or Editor roles to evaluate detached deployments in a deployment space.

In your project, you can create and evaluate a detached prompt template and promote a detached prompt template to a deployment space.

If you don't promote a detached prompt template to a deployment space, you must create a detached prompt template that connects your external model to watsonx.governance before you evaluate detached prompt templates in spaces. You must provide connection details such as the name of your external model and its URL when you create the detached prompt template. The following example shows you how to create a detached prompt template with the API:

{
    "name": "prompt name",
    "description": "prompt description",
    "model_version": {
        "number": "2.0.0-rc.7",
        "tag": "my prompt tag",
        "description": "my description"
    },
    "prompt_variables": {
        "var1": {},
        "var2": {}
    },
    "task_ids": [
        "retrieval_augmented_generation"
    ],
    "input_mode": "detached",
    "prompt": {
        "model_id": "",
        "input": [
            [
                "Some input",
                ""
            ]
        ],
        "data": {},
        "external_information": {
            "external_prompt_id": "external prompt",
            "external_model_id": "external model",
            "external_model_provider": "external provider",
            "external_prompt": {
                "url": "https://asdfasdf.com?asd=a&32=1",
                "additional_information": [
                    {
                        "additional_key": "additional settings"
                    }
                ]
            },
            "external_model": {
                "name": "An external model",
                "url": "https://asdfasdf.com?asd=a&32=1"
            }
        }
    }
}

Creating a detached deployment from a space

If you don't promote a detached prompt template to deployment space from your project, you must create a detached deployment from a space. You can use the following steps to create a detached deployment from a deployment space:

  1. After you create the prompt template, save it to a deployment space, specifying the space ID.

    {
        "prompt_template": {
        "id": "<PT ID>"
        },
        "detached": {},
        "base_model_id": "abcabc",
        "description": "Prompt template deployment description",
        "name": "Prompt template deployment name",
        "space_id": "<Space ID>"
    }
    
  2. From the Assets tab of the deployment space, click New deployment for the detached prompt template asset.

  3. Choose Detached as the deployment type.

  4. Provide a name and an optional description for the deployment.

Creating a detached deployment

If you track the detached prompt template in an AI use case, the detached deployment is added to the use case.

Tracking a detached deployment

Evaluating a detached deployment in a space

The following sections describe how to evaluate detached deployments in spaces and review your evaluation results:

Evaluating detached deployments in pre-production spaces

Run evaluation

To run prompt template evaluations, you can click Evaluate on the Evaluations tab when you open a deployment to open the Evaluate prompt template wizard. You can run evaluations only if you are assigned the Admin or Editor roles for your deployment space.

Run external prompt template evaluation

Select dimensions

The Evaluate prompt template wizard displays the dimensions that are available to evaluate for the task type that is associated with your prompt. You can expand the dimensions to view the list of metrics that are used to evaluate the dimensions that you select.

Select external llm dimensions to evaluate

Watsonx.governance automatically configures evaluations for each dimension with default settings. To configure evaluations with different settings, you can select Advanced settings to set minimum sample sizes and threshold values for each metric as shown in the following example:

Configure external llm evaluations

Select test data

You must upload a CSV file that contains test data with reference columns that include the input and the expected model output. The test data that you upload must contain the model output to enable detached deployment evaluations. When the upload completes, you must also map prompt variables to the associated columns from your test data. Select external LLM test data to upload

Review and evaluate

You can review the selections for the prompt task type, the uploaded test data, and the type of evaluation that runs. You must select Evaluate to run the evaluation.

Review and evaluate detached prompt template evaluation settings

Reviewing evaluation results

When your evaluation finishes, you can review a summary of your evaluation results on the Evaluations tab in watsonx.governance to gain insights about your model performance. The summary provides an overview of metric scores and violations of default score thresholds for your prompt template evaluations.

To analyze results, you can click the arrow navigation arrow next to your prompt template evaluation to view data visualizations of your results over time. You can also analyze results from the model health evaluation that is run by default during prompt template evaluations to understand how efficiently your model processes your data.

The Actions menu also provides the following options to help you analyze your results:

  • Evaluate now: Run evaluation with a different test data set
  • All evaluations: Display a history of your evaluations to understand how your results change over time.
  • Configure monitors: Configure evaluation thresholds and sample sizes.
  • View model information: View details about your model to understand how your deployment environment is set up.

Analyze detached prompt template evaluation results

Evaluating detached deployments in production spaces

Activate evaluation

To run prompt template evaluations, you can click Activate on the Evaluations tab when you open a deployment to open the Evaluate prompt template wizard. You can run evaluations only if you are assigned the Admin or Editor roles for your deployment space.

Run detached prompt template evaluation

If you don't have a watsonx.governance instance that is associated with your deployment space, you must select Associate a service instance in the Associate a service instance dialog box before you can run evaluations. In the Associate instance for evaluation window, you must choose the watsonx.governance instance that you want to use and select Associate a service instance to associate an instance with your deployment space. You must be assigned the Admin role for your deployment space to associate instances.

Associate watsonx.governance instance

If you don't have a database that is associated with your watsonx.governance instance, you must also associate a database before you can run evaluations. To associate a database, you must also click Associate database in the Database required dialog box to connect to a database. You must be assigned the Admin role for your deployment space and watsonx.governance instance to associate databases.

Select dimensions

The Evaluate prompt template wizard displays the dimensions that are available to evaluate for the task type that is associated with your prompt. You can provide a label column name for the reference output that you specify in your feedback data. You can also expand the dimensions to view the list of metrics that are used to evaluate the dimensions that you select.

Select dimensions to evaluate

Watsonx.governance automatically configures evaluations for each dimension with default settings. To configure evaluations with different settings, you can select Advanced settings to set minimum sample sizes and threshold values for each metric as shown in the following example:

Configure evaluations

Review and evaluate

You can review the selections for the prompt task type and the type of evaluation that runs. You can also select View payload schema or View feedback schema to validate that your column names match the prompt variable names in the prompt template. You must select Activate to run the evaluation.

Review and evaluate selections

To generate evaluation results, select Evaluate now in the Actions menu to open the Import test data window when the evaluation summary page displays.

Select evaluate now

Import test data

In the Import test data window, you can select Upload payload data or Upload feedback data to upload a CSV file that contains labeled columns that match the columns in your payload and feedback schemas.

Import test data

Reviewing evaluation results

When your evaluation finishes, you can review a summary of your evaluation results on the Evaluations tab in watsonx.governance to gain insights about your model performance. The summary provides an overview of metric scores and violations of default score thresholds for your prompt template evaluations.

To analyze results, you can click the arrow navigation arrow next to your prompt template evaluation to view data visualizations of your results over time. You can also analyze results from the model health evaluation that is run by default during prompt template evaluations to understand how efficiently your model processes your data.

The Actions menu also provides the following options to help you analyze your results:

  • Evaluate now: Run evaluation with a different test data set
  • Configure monitors: Configure evaluation thresholds and sample sizes.
  • View model information: View details about your model to understand how your deployment environment is set up.

Analyze detached prompt template evaluation results

If you are tracking the detached deployment in an AI use case, details about the model and evaluation results are recorded in a factsheet that you can view.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more