Output token count evaluation metric

Last updated: Mar 04, 2025

The output token count metric calculates the total, average, minimum, maximum, and median output token count across scoring requests during evaluations.

Metric details

Output token count is a token count metric for model health monitor evaluation metric that calculates the number of tokens that are processed across scoring requests.

Scope

The output token count metric generative AI assets only.

Generative AI tasks:
- Text summarization
- Text classification
- Content generation
- Entity extraction
- Question answering
- Retrieval Augmented Generation (RAG)
Supported languages: English

Evaluation process

To calculate the output token count metric, you must specify the generated_token_count field when you send scoring requests with the Python SDK to calculate the input and output token count metrics as shown in the following example:

request = {
            "fields": [
                "comment"
            ],
            "values": [
                [
                    "Customer service was friendly and helpful."
                ]
            ]
        }
response = {
            "fields": [
                "generated_text",
                "generated_token_count",
                "input_token_count",
                "stop_reason",
                "scoring_id",
                "response_time"
            ],
            "values": [
                [
                    "1",
                    2,
                    73,
                    "eos_token",
                    "MRM_7610fb52-b11d-4e20-b1fe-f2b971cae4af-50",
                    3558
                ],
                [
                    "0",
                    3,
                    62,
                    "eos_token",
                    "MRM_7610fb52-b11d-4e20-b1fe-f2b971cae4af-51",
                    3778
                ]
            ]
        }

from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord    
        client.data_sets.store_records(
            data_set_id=payload_data_set_id, 
            request_body=[
                PayloadRecord(
                    scoring_id=<uuid>,
                    request=request,
                    response=response,
                    response_time=<response_time>,
                    user_id=<user_id>). --> value to be supplied by user 
            ]
        )

Parent topic: Evaluation metrics

Was the topic helpful?

0/1000