Watson Machine Learning plans and compute usage

You use Watson Machine Learning resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use Watson Machine Learning resources, measured in resource units (RU), when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.

Watson Machine Learning in Cloud Pak for Data as a Service and watsonx

Important:

The Watson Machine Learning plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.

If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.

Choosing a Watson Machine Learning plan

View a comparison of plans and consider the details to choose a plan that fits your needs.

Watson Machine Learning plans
Capacity Unit Hours (CUH), tokens, and Resource Units (RU)
Watson Machine Learning plan details
Capacity Unit Hours metering
Monitoring CUH and RU usage

Watson Machine Learning plans

Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning and for prompts you use with foundation models. Choose a plan based on your needs:

Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.

For plan details and pricing, see IBM Cloud Machine Learning.

Capacity Unit Hours (CUH), tokens, and Resource Units (RU)

For metering and billing purposes, machine learning models and deployments or foundation models are measured with these units:

Capacity Unit Hours (CUH) measure compute resource consumption per unit hour for usage and billing purposes. CUH measures all Watson Machine Learning activity except for Foundation Model inferencing.
Resource Units (RU) measure foundation model inferencing consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. Choose a plan that corresponds to your usage requirements.
A rate limit monitors and restricts the number of inferencing requests per second processed for foundation models for a given Watson Machine Learning plan instance. The rate limit is higher for paid plans than for the free Lite plan.

What is measured for CUH or RU consumption?

Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, and Bash scripts consume resources for as long as they are active.

Watson Machine Learning plan details

The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.

Table 1. Plan details
Plan features	Lite	Essentials	Standard
Machine Learning usage in CUH	20 CUH per month	CUH billing based on CUH rate multiplied by hours of consumption	2500 CUH per month
Foundation model inferencing in tokens or Resource Units (RU)	50,000 tokens per month	Billed for usage (1000 tokens = 1 RU)	Billed for usage (1000 tokens = 1 RU)
Max parallel Decision Optimization batch jobs per deployment	2	5	100
Deployment jobs retained per space	100	1000	3000
Deployment time to idle	1 day	3 days	3 days
HIPAA support	NA	NA	Dallas region only Must be enabled in your IBM Cloud account
Rate limit per plan ID	2 inference requests per second	8 inference requests per second	8 inference requests per second
Support for custom foundation models	Not available	Not available	Billed by configuration

Note: If you upgrade from Essentials to Standard, you cannot revert to an Essentials plan. You must create a new plan.

For all plans:

Foundational Model inferencing Resource Units (RU) can be used for Prompt Lab inferencing, including input and output. That is, the prompt you enter for input is counted in addition to the generated output. (watsonx only)
Foundation model inferencing is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
Foundation model tuning in the Tuning Studio is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
Model classes determine the RU rate. The price per RU differs according to model class. (watsonx only)
Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
Watson Machine Learning places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.

For plan details and pricing, see IBM Cloud Machine Learning.

Resource unit metering (watsonx)

Resource Units billing is based on the rate of the billing class for the foundation model multipled by the number of Resource Units (RU). A Resource Unit is equal to 1000 tokens from the input and output of foundation model inferencing. The three foundation model billing classes have different RU rates. Embeddings models that vectorize text strings are billed at a different rate.

Resource unit billing rates by model class

Model billing class	Price per RU
Class 1	$0.0006
Class 2	$0.0018
Class 3	$0.0050
Class C1	$0.0001
Class 5	$0.00025
Class 7	$0.035
Mistral Large	$0.01

Resource unit billing rates for foundation models

For the following models, the billing rate is the same for input and output tokens.

Table 2. Foundation model billing details
Model	Origin	Billing class	Price per RU
granite-13b-instruct-v2	IBM	Class 1	$0.0006 per RU
granite-13b-chat-v2	IBM	Class 1	$0.0006 per RU
granite-7b-lab	IBM	Class 1	$0.0006 per RU
granite-8b-japanese	IBM	Class 1	$0.0006 per RU
granite-20b-multilingual	IBM	Class 1	$0.0006 per RU
granite-3b-code-instruct	IBM	Class 1	$0.0006 per RU
granite-8b-code-instruct	IBM	Class 1	$0.0006 per RU
granite-20b-code-instruct	IBM	Class 1	$0.0006 per RU
granite-34b-code-instruct	IBM	Class 1	$0.0006 per RU
allam-1-13b-instruct	Third party	Class 2	$0.0018 per RU
codellama-34b-instruct-hf	Third party	Class 2	$0.0018 per RU
elyza-japanese-llama-2-7b-instruct	Third party	Class 2	$0.0018 per RU
flan-t5-xl-3b	Open source	Class 1	$0.0006 per RU
flan-t5-xxl-11b	Open source	Class 2	$0.0018 per RU
flan-ul2-20b	Open source	Class 3	$0.0050 per RU
jais-13b-chat	Open source	Class 2	$0.0018 per RU
llama-3-8b-instruct	Third party	Class 1	$0.0006 per RU
llama-3-70b-instruct	Third party	Class 2	$0.0018 per RU
llama-2-13b-chat	Third party	Class 1	$0.0006 per RU
llama-2-70b-chat	Third party	Class 2	$0.0018 per RU
llama2-13b-dpo-v7	Third party	Class 2	$0.0018 per RU
merlinite-7b	Open source	Class 1	$0.0006 per RU
mistral-large	Third party	Mistral Large	$0.01 per RU
mixtral-8x7b-instruct-v01	Open source	Class 1	$0.0006 per RU
mixtral-8x7b-instruct-v01-q	Open source	Class 1	$0.0006 per RU
mt0-xxl-13b	Open source	Class 2	$0.0018 per RU

For the following models, the billing rate is different for input and output tokens.

Table 2. Foundation model billing details when input and ouput are different rates
Model	Origin	Input tokens	Output tokens
llama-3-405b-instruct	Meta	Class 3: $0.0050 per RU	Class 7: $0.035 per RU

Resource unit billing rates for embedding models

Embedding models transform sentences into vectors to more accurately compare and retrieve similar text.

Table 3. Embedding model billing details
Model	Origin	Billing class	Price per RU
slate.125m.english.rtrvr	IBM	Class C1	$0.0001 per RU
slate.30m.english.rtrvr	IBM	Class C1	$0.0001 per RU
all-MiniLM-L12-v2	Open source	Class C1	$0.0001 per RU
multilingual-e5-large	Open source	Class C1	$0.0001 per RU

Capacity Unit Hours metering (watsonx and Watson Machine Learning)

CUH consumption is affected by the computational hardware resources you apply for a task as well as other factors such as the software specification and model type.

CUH consumption rates by asset type

Table 3. CUH consumption rates by asset type
Asset type	Capacity type	Capacity units per hour
AutoAI experiment	8 vCPU and 32 GB RAM	20
Decision Optimization training	2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	6 7 9 13
Decision Optimization deployments	2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	30 40 50 60
Machine Learning models (training, evaluating, or scoring)	1 vCPU and 4 GB RAM 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	0.5 1 2 4 8
Foundation model tuning experiment (watsonx only)	NVIDIA A100 80GB GPU	43

CUH consumption by deployment and framework type

CUH consumption is calculated using these formulas:

Table 4. CUH consumption by deployment and framework type
Deployment type	Framework	CUH calculation
Online	AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny	deployment_active_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Online	Spark, PMML, Scikit-Learn, Pytorch, XGBoost	score_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Batch	all frameworks	job_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework

For example, consider a Decision Optimization batch deployment job that runs for 15 minutes. Resource consumption is calculated this way: 15 minutes = 0.25 hours, on 2 nodes, and with 2 vCPU and 8 GB RAM. This combination results in a CUH rate of 30, so every time the job runs it consumes 0.25 * 2 * 30, which equals 15 CUH.

Monitoring resource usage

You can track resource usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH, RU usage or hourly billing charges for an entire account.

Tracking resource usage in a project

To monitor CUH or RU consumption or hourly usage in a project:

Navigate to the Manage tab for a project.
Click Resources to review a summary of resource consumption for assets in the project or space, or to review resource consumption details for particular assets.

Tracking resource usage for an account

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner. For details, see Monitoring resources.

Tracking CUH consumption for machine learning in a notebook

To calculate capacity unit hours in a notebook, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Learn more

Parent topic: Watson Machine Learning