API throughput evaluation metric
The API throughput metric measures the number of scoring requests processed by your model deployment per second.
Metric details
API throughput is a throughput and latency metric for model health monitor evaluations that calculates throughput by tracking the number of scoring requests and transaction records that are processed per second.
Scope
The API throughput metric evaluates generative AI assets and machine larning models.
- Generative AI tasks:
- Text summarization
- Text classification
- Content generation
- Entity extraction
- Question answering
- Retrieval Augmented Generation (RAG)
- Machine learning problem type:
- Binary classification
- Multiclass classification
- Regression
- Supported languages: English
Evaluation process
The average, maximum, median, and minimum API throughput for scoring requests and transaction records are calculated during model health monitor evaluations.
To calculate the API throughput metric,
value from your scoring requests is used to track the time that your model deployment takes to process scoring requests.response_time
For watsonx.ai Runtime deployments, the
value is automatically detected when you configure evaluations.response_time
For external and custom deployments, you must specify the
value when you send scoring requests to calculate throughput and latency as shown in the following example from the Python SDK:response_time
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
client.data_sets.store_records(
data_set_id=payload_data_set_id,
request_body=[
PayloadRecord(
scoring_id=<uuid>,
request=openscale_input,
response=openscale_output,
response_time=<response_time>,
user_id=<user_id>)
]
)
Parent topic: Evaluation metrics