You use watsonx.ai Runtime resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use watsonx.ai Runtime resources, measured by tokens consumed or at an hourly rate, when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.
watsonx.ai Runtime in Cloud Pak for Data as a Service and watsonx
The watsonx.ai Runtime plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.
If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.
Choosing a watsonx.ai Runtime plan
watsonx.ai Runtime plans govern how you are billed for models you train and deploy with watsonx.ai Runtime and for prompts you use with foundation models. Choose a plan based on your needs:
- Lite is a free plan with limited capacity. Choose this plan if you are evaluating watsonx.ai Runtime and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
- Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
- Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.
For plan details and pricing, see .
How resource consumption is tracked
For metering and billing purposes, machine learning models and deployments or foundation models are measured with these charge metrics:
-
Capacity Unit Hour (CUH) measures compute resource consumption per unit hour for usage and billing purposes. CUH measures all watsonx.ai Runtime activity except for Foundation Model inferencing.
-
Resource Unit (RU) measures foundation model inference consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt.
-
Hour rate is used to calculate charges for custom foundation models that you import into watsonx.ai and deploy. The rate is based on configuration size and is charged for the duration of the model deployment.
-
Page rate is used to calculate charges for document text extraction. The page rate is set by plan.
What is measured for resource consumption?
Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, Bash scripts, and custom model deployments consume resources for as long as they are active.
watsonx.ai Runtime plan details
The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.
Plan features | Lite | Essentials | Standard |
---|---|---|---|
watsonx.ai Runtime usage in CUH | 20 CUH per month | CUH billing based on CUH rate multiplied by hours of consumption | 2500 CUH per month |
Foundation model inferencing in tokens or Resource Units (RU) | 50,000 tokens per month | Billed for usage (1000 tokens = 1 RU) | Billed for usage (1000 tokens = 1 RU) |
Max parallel Decision Optimization batch jobs per deployment | 2 | 5 | 100 |
Deployment jobs retained per space | 100 | 1000 | 3000 |
Deployment time to idle | 1 day | 3 days | 3 days |
HIPAA support | NA | NA | Dallas region only Must be enabled in your IBM Cloud account |
Rate limit per plan ID | 2 inference requests per second | 8 inference requests per second | 8 inference requests per second |
Support for custom foundation models | Not available | Not available | Billed hourly by configuration |
Document text extraction | Not available | Billed per page | Billed per page |
watsonx.ai Runtime pricing details
For more information on billing rates and how resource consumption is calculated, see:
Learn more
- Billing details for generative AI assets
- Billing details for machine learning assets
- For more information on tracking computing resource allocation and consumption, see Runtime usage.
Parent topic: watsonx.ai Runtime