If you are evaluating machine learning models or generative AI assets, you must send model transactions from your deployment to enable model evaluations.
To generate accurate results for your model evaluations continuously, you use must continue to send new data from your deployment.
The following sections describe different methods that you can use to send transactions for model evaluations:
- Sending model transactions for machine learning model evaluations
- Sending model transactions for generative AI asset evaluations
Sending model transactions
Importing data
When you review evaluation results on the Insights dashboard, you can use the Actions menu to import payload and feedback data for your model evaluations.
For pre-production models, you can import data by uploading CSV files or connecting to data that is stored in Cloud Object Storage or a Db2 database.
If you want to upload data that is already scored, you can select the Test data includes model output
checkbox. Cloud Object Storage does not rescore the test data when you select this option. The data that you import can also
include record_id
/transaction_id
and record_timestamp
columns that are added to the payload logging and feedback tables when this option is selected.
For production models, you can import data by uploading CSV files or using endpoints to send your model transactions.
Using endpoints
For production models, endpoints are supported that you can use to provide data in formats that enable evaluations. You can use the payload logging endpoint to send scoring requests for fairness and drift evaluations and use the feedback logging endpoint to provide feedback data for quality evaluations. You can also upload CSV files to provide data for model evaluations. For more information about the data formats, see Managing data for model evaluations.
A debiased transactions endpoint that you can use to review the results of fairness evaluations is also supported. The debiased transactions endpoint applies active debiasing on your payload data to detect any bias in your model. For more information about active debiasing, see Reviewing debiased transactions.
You can use the following steps to send model transactions for your model evaluations with endpoints:
- On the monitor configuration page, select the Endpoints tab.
- If you want to upload payload data with a CSV file, click Upload payload data.
- If you want to upload feedback data with a CSV file, click Upload feedback data.
- In the Model information panel, click Endpoints.
- From the Endpoint menu, select the type of endpoint that you want to use.
- From the Code language menu, choose the type of code snippet that you want to use.
- Click Copy to clipboard to copy the code snippet and run the code in your notebook or application.
Logging payload data with Python
When you select the payload data endpoint from the Endpoints menu in Watson OpenScale, you can use the following code snippet to show you how to log your payload data:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson_openscale import APIClient
service_credentials = {
"apikey": "$API_KEY",
"url": "{$WOS_URL}"
}
authenticator = IAMAuthenticator(
apikey=service_credentials["apikey"],
url="https://iam.cloud.ibm.com/identity/token"
)
SERVICE_INSTANCE_ID = "{$WOS_SERVICE_INSTANCE_ID}"
wos_client = APIClient(authenticator=authenticator, service_instance_id=SERVICE_INSTANCE_ID, service_url=service_credentials["url"])
from ibm_watson_openscale.data_sets import DataSetTypes, TargetTypes
# Put your subscription ID here
SUBSCRIPTION_ID = "{$SUBSCRIPTION_ID}"
payload_logging_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, target_target_id=SUBSCRIPTION_ID, target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
# Put your data here
REQUEST_DATA = {
"parameters": {
"template_variables": {
"{$TEMPLATE_VARIABLE_1}": "$TEMPLATE_VARIABLE_1_VALUE",
"{$TEMPLATE_VARIABLE_2}": "$TEMPLATE_VARIABLE_2_VALUE"
}
},
"project_id": "$PROJECT_ID"
}
RESPONSE_DATA = {
"results": [
{
"generated_text": "$GENERATED_TEXT"
}
]
}
RESPONSE_TIME = $RESPONSE_TIME
wos_client.data_sets.store_records(data_set_id=payload_logging_data_set_id, request_body=[PayloadRecord(request=REQUEST_DATA, response=RESPONSE_DATA, response_time=RESPONSE_TIME)])
The "project_id": "$PROJECT_ID"
value specifies that you want to log payload data for evaluations in projects. To log payload data for evaluations in spaces, you can specify the "space_id": "$SPACE_ID"
value instead. You can use the Manage tab in projects and spaces to identify the project or space ID for your model.
Logging feedback data with Python
When you select the feedback data endpoint from the Endpoints menu in Watson OpenScale, you can use the following code snippet to show you how to log your feedback data:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.supporting_classes.enums import DataSetTypes, TargetTypes
service_credentials = {
"apikey": "$API_KEY",
"url": "{$WOS_URL}"
}
authenticator = IAMAuthenticator(
apikey=service_credentials["apikey"],
url="https://iam.cloud.ibm.com/identity/token"
)
SERVICE_INSTANCE_ID = "{$WOS_SERVICE_INSTANCE_ID}"
wos_client = APIClient(authenticator=authenticator, service_instance_id=SERVICE_INSTANCE_ID, service_url=service_credentials["url"])
subscription_id = "{$SUBSCRIPTION_ID}"
feedback_dataset_id = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, target_target_id=subscription_id, target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
fields = [
"{$TEMPLATE_VARIABLE_1}",
"{$TEMPLATE_VARIABLE_2}",
"{$LABEL_COLUMN}",
"_original_prediction"
]
values = [
[
"$TEMPLATE_VARIABLE_1_VALUE",
"$TEMPLATE_VARIABLE_2_VALUE",
"$LABEL_COLUMN_VALUE",
"$GENERATED_TEXT_VALUE"
]
]
wos_client.data_sets.store_records(
data_set_id=feedback_dataset_id,
request_body=[{"fields": fields, "values": values}],
background_mode=False
)
Sending model transactions in watsonx.governance
Importing data in watsonx.governance
When you review evaluation results in watsonx.governance, you can import data by selecting Evaluate now in the Actions menu to import payload and feedback data for your model evaluations.
For pre-production models, you must upload a CSV file that contains examples of input and output data. To run evaluations with imported data, you must map prompt variables to the associated columns in your CSV file and select Upload and evaluate as shown in the following example:
For production models, you can select Upload payload data or Upload feedback data in the Import test data window to upload a CSV file as shown in the following example:
The CSV file must contain labeled columns that match the columns in your payload and feedback schemas. When your upload completes successfully, you can select Evaluate now to run your evaluations with your imported data.
Using endpoints in watsonx.governance
For production models, endpoints that you can use to provide data in formats that enable evaluations are supported. You can use the payload logging endpoint to send scoring requests for drift evaluations and use the feedback logging endpoint to provide feedback data for quality evaluations.
You can use code snippets to log scoring requests with APIs or your notebooks.
Logging feedback data with cuRL
The following example shows you how to log feedback data with cURL code snippets:
# Generate an IAM access token by passing an API key as $APIKEY in the request below
# See: https://cloud.ibm.com/docs/account?topic=account-iamtoken_from_apikey
curl -k -X POST --header "Content-Type: application/x-www-form-urlencoded" --header "Accept: application/json" --data-urlencode "grant_type=urn:ibm:params:oauth:grant-type:apikey" --data-urlencode "apikey=$APIKEY" "https://iam.cloud.ibm.com/identity/token"
# the above CURL request will return an auth token that you will use as $IAM_TOKEN in the requests below.
# retrieve the ID of the data set to store the feedback records
curl --location --request GET "${WOS_URL}/openscale/${WOS_SERVICE_INSTANCE_ID}/v2/data_sets?target.target_id=${SUBSCRIPTION_ID}&target.target_type=subscription&type=feedback" --header "Authorization: bearer $IAM_TOKEN" --header "Accept: application/json"
# the above request will return the ID of feedback records data set that you will use as $DATA_SET_ID in the request below. ID will be found at data_sets[0]['metadata']['id']
# TODO: manually define fields and list of values for feedback data
FEEDBACK_PAYLOAD = '[{
"fields": [
"${TEMPLATE_VARIABLE_1}",
"${TEMPLATE_VARIABLE_2}",
"${LABEL_COLUMN}",
"_original_prediction"
],
"values": [
[
"$TEMPLATE_VARIABLE_1_VALUE",
"$TEMPLATE_VARIABLE_2_VALUE",
"$LABEL_COLUMN_VALUE",
"$GENERATED_TEXT_VALUE"
]
]
}]'
curl --location --request POST "${WOS_URL}/openscale/${WOS_SERVICE_INSTANCE_ID}/v2/data_sets/$DATA_SET_ID/records" -d "$FEEDBACK_PAYLOAD" --header "Authorization: bearer $IAM_TOKEN" --header "Accept: application/json" --header "Content-Type: application/json"
Logging feedback data with Python
The following example shows you how to log feedback data with Python code snippets:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.supporting_classes.enums import DataSetTypes, TargetTypes
service_credentials = {
"apikey": "$API_KEY",
"url": "{$WOS_URL}"
}
authenticator = IAMAuthenticator(
apikey=service_credentials["apikey"],
url="https://iam.cloud.ibm.com/identity/token"
)
SERVICE_INSTANCE_ID = "{$WOS_SERVICE_INSTANCE_ID}"
wos_client = APIClient(authenticator=authenticator, service_instance_id=SERVICE_INSTANCE_ID, service_url=service_credentials["url"])
subscription_id = "{$SUBSCRIPTION_ID}"
PROJECT_ID = "{$PROJECT_ID}"
feedback_dataset_id = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, target_target_id=subscription_id, target_target_type=TargetTypes.SUBSCRIPTION, project_id=PROJECT_ID).result.data_sets[0].metadata.id
fields = [
"{$TEMPLATE_VARIABLE_1}",
"{$TEMPLATE_VARIABLE_2}",
"{$LABEL_COLUMN}",
"_original_prediction"
]
values = [
[
"$TEMPLATE_VARIABLE_1_VALUE",
"$TEMPLATE_VARIABLE_2_VALUE",
"$LABEL_COLUMN_VALUE",
"$GENERATED_TEXT_VALUE"
]
]
wos_client.data_sets.store_records(
data_set_id=feedback_dataset_id,
request_body=[{"fields": fields, "values": values}],
project_id=PROJECT_ID
background_mode=False
)
The project_id=PROJECT_ID
value specifies that you want to log feedback data for evaluations in projects. To log feedback data for evaluations in spaces, you can specify the space_id=SPACE_ID
value instead. You can
use the Manage tab in projects and spaces to identify the project or space ID for your model.
Logging payload data with cURL
The following example shows you how to log payload data with cURL code snippets:
# Generate an IAM access token by passing an API key as $APIKEY in the request below
# See: https://cloud.ibm.com/docs/account?topic=account-iamtoken_from_apikey
curl -k -X POST \
--header "Content-Type: application/x-www-form-urlencoded" \
--header "Accept: application/json" \
--data-urlencode "grant_type=urn:ibm:params:oauth:grant-type:apikey" \
--data-urlencode "apikey=$APIKEY" \
"https://iam.cloud.ibm.com/identity/token"
# the above CURL request will return an auth token that you will use as $IAM_TOKEN in the requests below.
# retrieve the ID of the data set to store the payload records
curl --location --request GET "${WOS_URL}/openscale/{$WOS_SERVICE_INSTANCE_ID}/v2/data_sets?target.target_id=${SUBSCRIPTION_ID}&target.target_type=subscription&type=payload_logging" \
--header "Authorization: bearer $IAM_TOKEN" \
--header "Accept: application/json"
# the above request will return the ID of payload records data set that you will use as $DATA_SET_ID in the request below.
# TODO: manually define and pass:
# request - input to scoring endpoint in format supported by Watson OpenScale - replace sample fields and values with proper ones
# response - output from scored model in format supported by Watson OpenScale - replace sample fields and values with proper ones
# - $SCORING_TIME - Time (ms) taken to make prediction (for performance monitoring)
SCORING_PAYLOAD='[{
"response_time": "$SCORING_TIME",
"request": {
"parameters": {
"template_variables": {
"${TEMPLATE_VARIABLE_1}": "$TEMPLATE_VARIABLE_1_VALUE",
"${TEMPLATE_VARIABLE_2}": "$TEMPLATE_VARIABLE_2_VALUE"
}
},
"project_id": "$PROJECT_ID"
},
"response": {
"results": [
{
"generated_text": "$GENERATED_TEXT"
}
]
},
"user_id": "$USER_ID"
}]'
curl --location --request POST "${WOS_URL}/openscale/${WOS_SERVICE_INSTANCE_ID}/v2/data_sets/$DATA_SET_ID/records" \
-d "$SCORING_PAYLOAD" \
--header "Authorization: bearer $IAM_TOKEN" \
--header "Accept: application/json" \
--header "Content-Type: application/json"
The "project_id": "$PROJECT_ID"
value specifies that you want to log payload data for evaluations in projects. To log payload data for evaluations in spaces, you can specify the "space_id": "$SPACE_ID"
value instead. You can use the Manage tab in projects and spaces to identify the project or space ID for your model.
Logging payload data with Python
The following example shows you how to log payload data with Python code snippets:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson_openscale import APIClient
service_credentials = {
"apikey": "$API_KEY",
"url": "{$WOS_URL}"
}
authenticator = IAMAuthenticator(
apikey=service_credentials["apikey"],
url="https://iam.cloud.ibm.com/identity/token"
)
SERVICE_INSTANCE_ID = "{$WOS_SERVICE_INSTANCE_ID}"
wos_client = APIClient(authenticator=authenticator, service_instance_id=SERVICE_INSTANCE_ID, service_url=service_credentials["url"])
from ibm_watson_openscale.data_sets import DataSetTypes, TargetTypes
# Put your subscription ID here
SUBSCRIPTION_ID = "{$SUBSCRIPTION_ID}"
PROJECT_ID = "{$PROJECT_ID}"
payload_logging_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, target_target_id=SUBSCRIPTION_ID, target_target_type=TargetTypes.SUBSCRIPTION, project_id=PROJECT_ID).result.data_sets[0].metadata.id
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
# Put your data here
REQUEST_DATA = {
"parameters": {
"template_variables": {
"{$TEMPLATE_VARIABLE_1}": "$TEMPLATE_VARIABLE_1_VALUE",
"{$TEMPLATE_VARIABLE_2}": "$TEMPLATE_VARIABLE_2_VALUE"
}
},
"project_id": PROJECT_ID
}
RESPONSE_DATA = {
"results": [
{
"generated_text": "$GENERATED_TEXT"
}
]
}
RESPONSE_TIME = $RESPONSE_TIME
wos_client.data_sets.store_records(data_set_id=payload_logging_data_set_id, request_body=[PayloadRecord(request=REQUEST_DATA, response=RESPONSE_DATA, response_time=RESPONSE_TIME)], project_id=PROJECT_ID)
The PROJECT_ID = "{$PROJECT_ID}"
specifies that you want to log payload data for evaluations in projects. To log payload data for evaluations in spaces, you can specify the SPACE_ID = "{$SPACE_ID}"
value instead. You can use the Manage tab in projects and spaces to identify the project or space ID for your model.
For more information about the data formats, see Managing data for model evaluations.
Parent topic: Managing data for model evaluations