You use watsonx.ai Runtime resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use watsonx.ai Runtime resources, measured by tokens consumed or at
an hourly rate, when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.
Note: The watsonx.ai Runtime service was formerly known as the Watson Machine Learning service.
watsonx.ai Runtime in Cloud Pak for Data as a Service and watsonx
Copy link to section
Important:
The watsonx.ai Runtime plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service,
then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.
If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.
Choosing a watsonx.ai Runtime plan
Copy link to section
watsonx.ai Runtime plans govern how you are billed for models you train and deploy with watsonx.ai Runtime and for prompts you use with foundation models. Choose a plan based on your needs:
Lite is a free plan with limited capacity. Choose this plan if you are evaluating watsonx.ai Runtime and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
Standard is a high-capacity enterprise plan that is designed to support all of an organization's AI needs. This plan incurs a monthly instance fee that includes a block of 2500 Capacity unit hours (CUH). Any CUH usage above
this amount is charged at the plan rate. All other usage is metered on a pay-as-you-go basis. Important: The instance
fee for the watsonx.ai Runtime Standard plan (for example, $1050/month USD) is billed regardless of CUH usage. For example, if you only consume resource units, you are still charged the instance fee. The fee is pro-rated if the plan is canceled.
For metering and billing purposes, machine learning models and deployments or foundation models are measured with these charge metrics:
Capacity Unit Hour (CUH) measures compute resource consumption per unit hour for usage and billing purposes. CUH measures all watsonx.ai Runtime activity except for Foundation Model inferencing.
Resource Unit (RU) measures foundation model inference consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic
unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. For details on tokens, see Tokens and tokenization.
Hour rate is used to calculate charges for custom foundation models that you import into watsonx.ai and deploy. The rate is based on configuration size and is charged for the duration of the model deployment.
Page rate is used to calculate charges for document text extraction. The page rate is set by plan.
What is measured for resource consumption?
Copy link to section
Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge
for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed
asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, Bash scripts, and custom model deployments consume resources for as long as they are active.
Note: You do not consume tokens when you use the generative AI search and answer app for this documentation site.
watsonx.ai Runtime plan details
Copy link to section
The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.
Table 1. Plan details
Plan features
Lite
Essentials
Standard
watsonx.ai Runtime usage in CUH
20 CUH per month
CUH billing based on CUH rate multiplied by hours of consumption
2500 CUH per month
Foundation model inferencing in tokens or Resource Units (RU)
50,000 tokens per month
Billed for usage (1000 tokens = 1 RU)
Billed for usage (1000 tokens = 1 RU)
Max parallel Decision Optimization batch jobs per deployment
2
5
100
Deployment jobs retained per space
100
1000
3000
Deployment time to idle
1 day
3 days
3 days
HIPAA support
NA
NA
- Available only for legacy Watson Studio and Watson Machine Learning plans on Cloud Pak for Data as a Service in Dallas region - Must be enabled in your IBM Cloud account - Not available for watsonx
plans.
Rate limit per plan ID
2 inference requests per second
8 inference requests per second
8 inference requests per second
Support for custom foundation models
Not available
Not available
Billed hourly by configuration
Document text extraction
Not available
Billed per page
Billed per page
Foundation model tuning
Not available
Tuning billed at 43 CUH per hour Inferencing billed for token usage
Tuning billed at 43 CUH per hour Inferencing billed for token usage
Note: If you upgrade from Essentials to Standard, you cannot revert to an Essentials plan. You must create a new plan.