API throughput evaluation metric

Last updated: Mar 14, 2025
API throughput evaluation metric

The API throughput metric measures the number of scoring requests processed by your model deployment per second.

Metric details

API throughput is a throughput and latency metric for model health monitor evaluations that calculates throughput by tracking the number of scoring requests and transaction records that are processed per second.

Scope

The API throughput metric evaluates generative AI assets and machine larning models.

  • Generative AI tasks:
    • Text summarization
    • Text classification
    • Content generation
    • Entity extraction
    • Question answering
    • Retrieval Augmented Generation (RAG)
  • Machine learning problem type:
    • Binary classification
    • Multiclass classification
    • Regression
  • Supported languages: English

Evaluation process

The average, maximum, median, and minimum API throughput for scoring requests and transaction records are calculated during model health monitor evaluations.

To calculate the API throughput metric, response_time value from your scoring requests is used to track the time that your model deployment takes to process scoring requests.

For watsonx.ai Runtime deployments, the response_time value is automatically detected when you configure evaluations.

For external and custom deployments, you must specify the response_time value when you send scoring requests to calculate throughput and latency as shown in the following example from the Python SDK:

    from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord            
        client.data_sets.store_records(
        data_set_id=payload_data_set_id, 
        request_body=[
        PayloadRecord(
            scoring_id=<uuid>,
            request=openscale_input,
            response=openscale_output,
            response_time=<response_time>,  
            user_id=<user_id>)
                    ]
        ) 

Parent topic: Evaluation metrics