Billing details for generative AI assets
Learn about how usage for generative AI assets is measured using resource unit (RU), hourly rates, or a flat rate.
Working with generative AI assets with watsonx.ai Runtime requires that you are using watsonx.ai. Overview of IBM watsonx.ai.
Review the details for how resources are measured using:
- Resource units to measure inferencing atcivities for foundation models provided by watsonx.ai.
- Hourly rates for custom foundation models you import and deploy with watsonx.ai.
- Flat rates by page for document text extraction.
Resource unit metering for foundation models
For the list of supported foundation models and their prices, see Supported foundation models. For the list of supported encoder models and their prices, see Supported encoder models.
A Resource Unit (RU) is equal to 1000 tokens from the input and output of foundation model inferencing. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt or for input to an embeddings model.
Each foundation model provided by IBM watsonx.ai is assigned an inference price for input and output. The price is derived as a multiple of the base price for an RU ($0.0001). For example, a model with a price of $0.0006 has a multiplier of 6 times the base rate.
A prompt tuned foundation model is assigned the same price as the underlying foundation model. For information about tuned foundation models, see Tuning Studio. Tuning a model in the Tuning Studio consumes capacity unit hours (CUH). For more information, see Billing details for machine learning assets.
Calculating the resource unit rate per model
To calculate charges for foundation model inference, divide the total number of tokens consumed during the month by 1000 and round up to the nearest 1000 to obtain the total number of RUs. Multiply the total number of RUs by the model price to obtain total usage charges. The model price varies by model and can also vary for input or output tokens for a given model.
The basic formula is as follows:
Total tokens used/1000 = Resource Units (RU) consumed
RU consumed x model price = Total usage charge
The base price for an RU is $0.0001. The price for each foundation model is a multiple of the base price.
Billing classes by multiplier
If you are monitoring model usage with the watsonx.ai API, model prices are listed by pricing tier, as follows:
Model pricing tier | Price per RU in USD | Multiplier x base rate |
---|---|---|
Class 1 | $0.0006 | 6 |
Class 2 | $0.0018 | 18 |
Class 3 | $0.0050 | 50 |
Class C1 | $0.0001 | 1 |
Class 5 | $0.00025 | 2.5 |
Class 7 | $0.016 | 160 |
Class 8 | $0.00015 | 1.5 |
Class 9 | $0.00035 | 3.5 |
Class 10 | $0.0020 | 20 |
Class 11 | $0.000005 | 0.05 |
Class 12 | $0.0002 | 2 |
Certain models, such as Mistral Large, have special pricing that is not assigned by a multiplier. The pricing is listed in Supported models.
Hourly billing rates for custom foundation models
Deploying custom foundation models requires the Standard plan.
Billing rates are according to model hardware configuration and apply for hosting and inferencing the model. Charges begin when the model is successfully deployed and continue until the model is deleted.
Configuration size | Billing rate per hour in USD |
---|---|
Small | $5.22 |
Medium | $10.40 |
Large | $20.85 |
For details on choosing a configuration for a custom foundation model, see Planning to deploy a custom foundation model.
Supported foundation models available with watsonx.ai
Rates per page for document text extraction
Use the document text extraction method of the watsonx.ai REST API to convert PDF files that are highly structured and use diagrams and tables to convey information, into an AI model-friendly JSON file format.
Billing is charged at a flat rate per page processed. A page can be a page of text (up to 1800 characters), an image, or a .tiff frame. The billing rate depends on your plan type.
Plan type | Price per page in USD |
---|---|
Essential | $0.038 |
Standard | $0.030 |
Learn more
- For more information on tracking computing resource allocation and consumption, see Runtime usage.
- For more information about each model, see Supported foundation models.
- For information about regional support for each model, see Regional availability for foundation models.
Parent topic: watsonx.ai Runtime plans