0 / 0
Watson Machine Learning plans and compute usage

Watson Machine Learning plans and compute usage

You use Watson Machine Learning resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use Watson Machine Learning resources, measured in resource units (RU), when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.

Watson Machine Learning in Cloud Pak for Data as a Service and watsonx

Important:

The Watson Machine Learning plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.

If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.

Choosing a Watson Machine Learning plan

View a comparison of plans and consider the details to choose a plan that fits your needs.

Watson Machine Learning plans

Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning and for prompts you use with foundation models. Choose a plan based on your needs:

  • Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
  • Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
  • Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.

For plan details and pricing, see IBM Cloud Machine Learning.

Capacity Unit Hours (CUH), tokens, and Resource Units (RU)

For metering and billing purposes, machine learning models and deployments or foundation models are measured with these units:

  • Capacity Unit Hours (CUH) measure compute resource consumption per unit hour for usage and billing purposes. CUH measures all Watson Machine Learning activity except for Foundation Model inferencing.

  • Resource Units (RU) measure foundation model inferencing consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. Choose a plan that corresponds to your usage requirements.

  • A rate limit monitors and restricts the number of inferencing requests per second processed for foundation models for a given Watson Machine Learning plan instance. The rate limit is higher for paid plans than for the free Lite plan.

What is measured for CUH or RU consumption?

Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, and Bash scripts consume resources for as long as they are active.

Watson Machine Learning plan details

The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.

Table 1. Plan details
Plan features Lite Essentials Standard
Machine Learning usage in CUH 20 CUH per month CUH billing based on CUH rate multiplied by hours of consumption 2500 CUH per month
Foundation model inferencing in tokens or Resource Units (RU) 50,000 tokens per month Billed for usage (1000 tokens = 1 RU) Billed for usage (1000 tokens = 1 RU)
Max parallel Decision Optimization batch jobs per deployment 2 5 100
Deployment jobs retained per space 100 1000 3000
Deployment time to idle 1 day 3 days 3 days
HIPAA support NA NA Dallas region only
Must be enabled in your IBM Cloud account
Rate limit per plan ID 2 inference requests per second 8 inference requests per second 8 inference requests per second
Support for custom foundation models Not available Not available Billed by configuration

Note: If you upgrade from Essentials to Standard, you cannot revert to an Essentials plan. You must create a new plan.

For all plans:

  • Foundational Model inferencing Resource Units (RU) can be used for Prompt Lab inferencing, including input and output. That is, the prompt you enter for input is counted in addition to the generated output. (watsonx only)
  • Foundation model inferencing is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
  • Foundation model tuning in the Tuning Studio is available from the Dallas, Frankfurt, London, and Tokyo data centers. (watsonx only)
  • Model classes determine the RU rate. The price per RU differs according to model class. (watsonx only)
  • Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
  • Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
  • Watson Machine Learning places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
  • Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
  • A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.

For plan details and pricing, see IBM Cloud Machine Learning.

Resource unit metering (watsonx)

Resource Units billing is based on the rate of the billing class for the foundation model multipled by the number of Resource Units (RU). A Resource Unit is equal to 1000 tokens from the input and output of foundation model inferencing. The three foundation model billing classes have different RU rates. Embeddings models that vectorize text strings are billed at a different rate.

Resource unit billing rates by model class

Model billing class Price per RU in USD
Class 1 $0.0006
Class 2 $0.0018
Class 3 $0.0050
Class C1 $0.0001
Class 5 $0.00025
Class 7 $0.016
Mistral Large $0.01

Resource unit billing rates for foundation models

For the following models, the billing rate is the same for input and output tokens.

Table 2a. Foundation model billing details
Model Origin Billing class Price per RU in USD
granite-13b-instruct-v2 IBM Class 1 $0.0006 per RU
granite-13b-chat-v2 IBM Class 1 $0.0006 per RU
granite-7b-lab IBM Class 1 $0.0006 per RU
granite-8b-japanese IBM Class 1 $0.0006 per RU
granite-20b-multilingual IBM Class 1 $0.0006 per RU
granite-3b-code-instruct IBM Class 1 $0.0006 per RU
granite-8b-code-instruct IBM Class 1 $0.0006 per RU
granite-20b-code-instruct IBM Class 1 $0.0006 per RU
granite-34b-code-instruct IBM Class 1 $0.0006 per RU
allam-1-13b-instruct Third party Class 2 $0.0018 per RU
codellama-34b-instruct-hf Third party Class 2 $0.0018 per RU
elyza-japanese-llama-2-7b-instruct Third party Class 2 $0.0018 per RU
flan-t5-xl-3b Open source Class 1 $0.0006 per RU
flan-t5-xxl-11b Open source Class 2 $0.0018 per RU
flan-ul2-20b Open source Class 3 $0.0050 per RU
jais-13b-chat Open source Class 2 $0.0018 per RU
llama-3-1-8b-instruct Third party Class 1 $0.0006 per RU
llama-3-1-70b-instruct Third party Class 2 $0.0018 per RU
llama-3-8b-instruct Third party Class 1 $0.0006 per RU
llama-3-70b-instruct Third party Class 2 $0.0018 per RU
llama-2-13b-chat Third party Class 1 $0.0006 per RU
llama-2-70b-chat Third party Class 2 $0.0018 per RU
llama2-13b-dpo-v7 Third party Class 2 $0.0018 per RU
mistral-large Third party Mistral Large $0.01 per RU
mixtral-8x7b-instruct-v01 Open source Class 1 $0.0006 per RU
mt0-xxl-13b Open source Class 2 $0.0018 per RU

For the following models, the billing rate is different for input and output tokens. Prices are shown in USD.

Table 2b. Foundation model billing details when input and ouput are different rates
Model Origin Input tokens Output tokens
llama-3-405b-instruct Meta Class 3: $0.0050 per RU Class 7: $0.016 per RU

Resource unit billing rates for embedding models

Embedding models transform sentences into vectors to more accurately compare and retrieve similar text.

Table 3. Embedding model billing details
Model Origin Billing class Price per RU in USD
slate.125m.english.rtrvr-v2 IBM Class C1 $0.0001 per RU
slate.125m.english.rtrvr IBM Class C1 $0.0001 per RU
slate.30m.english.rtrvr-v2 IBM Class C1 $0.0001 per RU
slate.30m.english.rtrvr IBM Class C1 $0.0001 per RU
all-MiniLM-L12-v2 Open source Class C1 $0.0001 per RU
multilingual-e5-large Open source Class C1 $0.0001 per RU

Hourly billing rates for custom foundation models

Deploying custom foundation models requires the Standard plan. Billing rates are according to model hardware configuration and apply for hosting and inferencing the model. Charges begin when the model is successfully deployed and continues until the model is deleted.

Configuration size Billing rate per hour in USD
Small $5.22
Medium $10.40
Large $20.85
Important: You can deploy a maximum of four small custom foundation models, two medium models, or one large model per account.

For details on choosing a configuration for a custom foundation model, see Planning to deploy a custom foundation model.

Billing rates for document text extraction

Use the document text extraction method of the watsonx.ai REST API to convert PDF files that are highly structured and use diagrams and tables to convey information, into an AI model-friendly JSON file format. For more information, see Extracting text from documents.

Billing is based on the number of pages processed as well as the plan type.

Plan type Price per page in USD
Essential $0.038
Standard $0.030

Capacity Unit Hours metering (watsonx and Watson Machine Learning)

CUH consumption is affected by the computational hardware resources you apply for a task as well as other factors such as the software specification and model type.

CUH consumption rates by asset type

Table 3. CUH consumption rates by asset type
Asset type Capacity type Capacity units per hour
AutoAI experiment 8 vCPU and 32 GB RAM 20
Decision Optimization training 2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
6
7
9
13
Decision Optimization deployments 2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
30
40
50
60
Machine Learning models
(training, evaluating, or scoring)
1 vCPU and 4 GB RAM
2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
0.5
1
2
4
8
Foundation model tuning experiment
(watsonx only)
NVIDIA A100 80GB GPU 43

CUH consumption by deployment and framework type

CUH consumption is calculated using these formulas:

Table 4. CUH consumption by deployment and framework type
Deployment type Framework CUH calculation
Online AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny deployment_active_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Online Spark, PMML, Scikit-Learn, Pytorch, XGBoost score_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Batch all frameworks job_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework

For example, consider a Decision Optimization batch deployment job that runs for 15 minutes. Resource consumption is calculated this way: 15 minutes = 0.25 hours, on 2 nodes, and with 2 vCPU and 8 GB RAM. This combination results in a CUH rate of 30, so every time the job runs it consumes 0.25 * 2 * 30, which equals 15 CUH.

Monitoring resource usage

You can track resource usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH, RU usage or hourly billing charges for an entire account.

Tracking resource usage in a project

To monitor CUH or RU consumption or hourly usage in a project:

  1. Navigate to the Manage tab for a project.

  2. Click Resources to review a summary of resource consumption for assets in the project or space, or to review resource consumption details for particular assets.

    Tracking resources in a project

Tracking resource usage for an account

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner. For details, see Monitoring resources.

Tracking CUH consumption for machine learning in a notebook

To calculate capacity unit hours in a notebook, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Learn more

Parent topic: Watson Machine Learning

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more