Watson Machine Learning plans and compute usage
You use Watson Machine Learning resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use Watson Machine Learning resources, measured in resource units (RU), when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.
Watson Machine Learning in Cloud Pak for Data as a Service and watsonx
The Watson Machine Learning plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.
For more information on watsonx.ai, see:
- Overview of IBM watsonx.ai
- Comparison of IBM watsonx and Cloud Pak for Data as a Service
- Signing up for IBM watsonx.ai
If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.
Choosing a Watson Machine Learning plan
View a comparison of plans and consider the details to choose a plan that fits your needs.
- Watson Machine Learning plans
- Capacity Unit Hours (CUH), tokens, and Resource Units (RU)
- Watson Machine Learning plan details
- Capacity Unit Hours metering
- Monitoring CUH and RU usage
Watson Machine Learning plans
Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning and for prompts you use with foundation models. Choose a plan based on your needs:
- Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
- Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
- Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.
For plan details and pricing, see IBM Cloud Machine Learning.
Capacity Unit Hours (CUH), tokens, and Resource Units (RU)
For metering and billing purposes, machine learning models and deployments or foundation models are measured with these units:
-
Capacity Unit Hours (CUH) measure compute resource consumption per unit hour for usage and billing purposes. CUH measures all Watson Machine Learning activity except for Foundation Model inferencing.
-
Resource Units (RU) measure foundation model inferencing consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. Choose a plan that corresponds to your usage requirements. For details on tokens, see Tokens and tokenization.
-
A rate limit monitors and restricts the number of inferencing requests per second processed for foundation models for a given Watson Machine Learning plan instance. The rate limit is higher for paid plans than for the free Lite plan.
What is measured for CUH or RU consumption?
Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, and Bash scripts consume resources for as long as they are active.
Watson Machine Learning plan details
The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.
Plan features | Lite | Essentials | Standard |
---|---|---|---|
Machine Learning usage in CUH | 20 CUH per month | CUH billing based on CUH rate multiplied by hours of consumption | 2500 CUH per month |
Foundation model inferencing in tokens or Resource Units (RU) | 50,000 tokens per month | Billed for usage (1000 tokens = 1 RU) | Billed for usage (1000 tokens = 1 RU) |
Max parallel Decision Optimization batch jobs per deployment | 2 | 5 | 100 |
Deployment jobs retained per space | 100 | 1000 | 3000 |
Deployment time to idle | 1 day | 3 days | 3 days |
HIPAA support | NA | NA | Dallas region only Must be enabled in your IBM Cloud account |
Rate limit per plan ID | 2 inference requests per second | 8 inference requests per second | 8 inference requests per second |
Support for custom foundation models | Not available | Not available | Billed by configuration |
For all plans:
- Foundational Model inferencing Resource Units (RU) can be used for Prompt Lab inferencing, including input and output. That is, the prompt you enter for input is counted in addition to the generated output. (watsonx only)
- Foundation model inferencing is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
- Foundation model tuning in the Tuning Studio is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
- Model classes determine the RU rate. The price per RU differs according to model class. (watsonx only)
- Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
- Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
- Watson Machine Learning places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
- Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
- A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.
For plan details and pricing, see IBM Cloud Machine Learning.
Resource unit metering (watsonx)
Resource Units billing is based on the rate of the billing class for the foundation model multipled by the number of Resource Units (RU). A Resource Unit is equal to 1000 tokens from the input and output of foundation model inferencing. The three foundation model billing classes have different RU rates. Embeddings models that vectorize text strings are billed at a different rate.
Resource unit billing rates by model class
Model billing class | Price per RU in USD |
---|---|
Class 1 | $0.0006 |
Class 2 | $0.0018 |
Class 3 | $0.0050 |
Class C1 | $0.0001 |
Class 5 | $0.00025 |
Class 7 | $0.016 |
Mistral Large | $0.01 |
Resource unit billing rates for foundation models
For the following models, the billing rate is the same for input and output tokens.
Model | Origin | Billing class | Price per RU in USD |
---|---|---|---|
granite-13b-instruct-v2 | IBM | Class 1 | $0.0006 per RU |
granite-13b-chat-v2 | IBM | Class 1 | $0.0006 per RU |
granite-7b-lab | IBM | Class 1 | $0.0006 per RU |
granite-8b-japanese | IBM | Class 1 | $0.0006 per RU |
granite-20b-multilingual | IBM | Class 1 | $0.0006 per RU |
granite-3b-code-instruct | IBM | Class 1 | $0.0006 per RU |
granite-8b-code-instruct | IBM | Class 1 | $0.0006 per RU |
granite-20b-code-instruct | IBM | Class 1 | $0.0006 per RU |
granite-34b-code-instruct | IBM | Class 1 | $0.0006 per RU |
allam-1-13b-instruct | Third party | Class 2 | $0.0018 per RU |
codellama-34b-instruct-hf | Third party | Class 2 | $0.0018 per RU |
elyza-japanese-llama-2-7b-instruct | Third party | Class 2 | $0.0018 per RU |
flan-t5-xl-3b | Open source | Class 1 | $0.0006 per RU |
flan-t5-xxl-11b | Open source | Class 2 | $0.0018 per RU |
flan-ul2-20b | Open source | Class 3 | $0.0050 per RU |
jais-13b-chat | Open source | Class 2 | $0.0018 per RU |
llama-3-1-8b-instruct | Third party | Class 1 | $0.0006 per RU |
llama-3-1-70b-instruct | Third party | Class 2 | $0.0018 per RU |
llama-3-8b-instruct | Third party | Class 1 | $0.0006 per RU |
llama-3-70b-instruct | Third party | Class 2 | $0.0018 per RU |
llama-2-13b-chat | Third party | Class 1 | $0.0006 per RU |
llama-2-70b-chat | Third party | Class 2 | $0.0018 per RU |
llama2-13b-dpo-v7 | Third party | Class 2 | $0.0018 per RU |
mistral-large | Third party | Mistral Large | $0.01 per RU |
mixtral-8x7b-instruct-v01 | Open source | Class 1 | $0.0006 per RU |
mt0-xxl-13b | Open source | Class 2 | $0.0018 per RU |
For the following models, the billing rate is different for input and output tokens. Prices are shown in USD.
Model | Origin | Input tokens | Output tokens |
---|---|---|---|
llama-3-405b-instruct | Meta | Class 3: $0.0050 per RU | Class 7: $0.016 per RU |
Resource unit billing rates for embedding models
Embedding models transform sentences into vectors to more accurately compare and retrieve similar text.
Model | Origin | Billing class | Price per RU in USD |
---|---|---|---|
slate.125m.english.rtrvr-v2 | IBM | Class C1 | $0.0001 per RU |
slate.125m.english.rtrvr | IBM | Class C1 | $0.0001 per RU |
slate.30m.english.rtrvr-v2 | IBM | Class C1 | $0.0001 per RU |
slate.30m.english.rtrvr | IBM | Class C1 | $0.0001 per RU |
all-MiniLM-L12-v2 | Open source | Class C1 | $0.0001 per RU |
multilingual-e5-large | Open source | Class C1 | $0.0001 per RU |
Hourly billing rates for custom foundation models
Deploying custom foundation models requires the Standard plan. Billing rates are according to model hardware configuration and apply for hosting and inferencing the model. Charges begin when the model is successfully deployed and continues until the model is deleted.
Configuration size | Billing rate per hour in USD |
---|---|
Small | $5.22 |
Medium | $10.40 |
Large | $20.85 |
For details on choosing a configuration for a custom foundation model, see Planning to deploy a custom foundation model.
Billing rates for document text extraction
Use the document text extraction method of the watsonx.ai REST API to convert PDF files that are highly structured and use diagrams and tables to convey information, into an AI model-friendly JSON file format. For more information, see Extracting text from documents.
Billing is based on the number of pages processed as well as the plan type.
Plan type | Price per page in USD |
---|---|
Essential | $0.038 |
Standard | $0.030 |
Notes on generative AI models
- A prompt tuned foundation model is assigned to the same billing class as the underlying foundation model. For example, if you prompt tune a class 1 foundation model, the cost for inferencing the tuned model is measured at the class 1 billing rate. For information about tuned foundation models, see Tuning Studio.
- For more information about each model, see Supported foundation models.
- For information about regional support for each model, see Regional availability for foundation models.
Capacity Unit Hours metering (watsonx and Watson Machine Learning)
CUH consumption is affected by the computational hardware resources you apply for a task as well as other factors such as the software specification and model type.
CUH consumption rates by asset type
Asset type | Capacity type | Capacity units per hour |
---|---|---|
AutoAI experiment | 8 vCPU and 32 GB RAM | 20 |
Decision Optimization training | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
6 7 9 13 |
Decision Optimization deployments | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
30 40 50 60 |
Machine Learning models (training, evaluating, or scoring) |
1 vCPU and 4 GB RAM 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
0.5 1 2 4 8 |
Foundation model tuning experiment (watsonx only) |
NVIDIA A100 80GB GPU | 43 |
CUH consumption by deployment and framework type
CUH consumption is calculated using these formulas:
Deployment type | Framework | CUH calculation |
---|---|---|
Online | AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny | deployment_active_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
Online | Spark, PMML, Scikit-Learn, Pytorch, XGBoost | score_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
Batch | all frameworks | job_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
For example, consider a Decision Optimization batch deployment job that runs for 15 minutes. Resource consumption is calculated this way: 15 minutes = 0.25 hours, on 2 nodes, and with 2 vCPU and 8 GB RAM. This combination results in a CUH rate of 30, so every time the job runs it consumes 0.25 * 2 * 30, which equals 15 CUH.
Monitoring resource usage
You can track resource usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH, RU usage or hourly billing charges for an entire account.
Tracking resource usage in a project
To monitor CUH or RU consumption or hourly usage in a project:
-
Navigate to the Manage tab for a project.
-
Click Resources to review a summary of resource consumption for assets in the project or space, or to review resource consumption details for particular assets.
Tracking resource usage for an account
You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner. For details, see Monitoring resources.
Tracking CUH consumption for machine learning in a notebook
To calculate capacity unit hours in a notebook, use:
CP = client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)
For example:
'capacity_units': {'current': 19773430}
19773430/(3600*1000)
returns 5.49 CUH
For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.
Learn more
Parent topic: Watson Machine Learning