0 / 0
Watson Machine Learning plans and compute usage

Watson Machine Learning plans and compute usage

You use Watson Machine Learning resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use Watson Machine Learning resources, measured in resource units (RU), when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.

Watson Machine Learning in Cloud Pak for Data as a Service and watsonx

Important:

The Watson Machine Learning plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.

If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.

Choosing a Watson Machine Learning plan

View a comparison of plans and consider the details to choose a plan that fits your needs.

Watson Machine Learning plans

Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning and for prompts you use with foundation models. Choose a plan based on your needs:

  • Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
  • Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
  • Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.

For plan details and pricing, see IBM Cloud Machine Learning.

Capacity Unit Hours (CUH), tokens, and Resource Units (RU)

For metering and billing purposes, machine learning models and deployments or foundation models are measured with these units:

  • Capacity Unit Hours (CUH) measure compute resource consumption per unit hour for usage and billing purposes. CUH measures all Watson Machine Learning activity except for Foundation Model inferencing.

  • Resource Units (RU) measure foundation model inferencing consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. Choose a plan that corresponds to your usage requirements.

  • A rate limit monitors and restricts the number of inferencing requests per second processed for foundation models for a given Watson Machine Learning plan instance. The rate limit is higher for paid plans than for the free Lite plan.

Watson Machine Learning plan details

The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.

Table 1. Plan details
Plan features Lite Essentials Standard
Machine Learning usage in CUH 20 CUH per month CUH billing based on CUH rate multiplied by hours of consumption 2500 CUH per month
Foundation model inferencing in tokens or Resource Units (RU) 50,000 tokens per month Billed for usage (1000 tokens = 1 RU) Billed for usage (1000 tokens = 1 RU)
Max parallel Decision Optimization batch jobs per deployment 2 5 100
Deployment jobs retained per space 100 1000 3000
Deployment time to idle 1 day 3 days 3 days
HIPAA support NA NA Dallas region only
Must be enabled in your IBM Cloud account
Rate limit per plan ID 2 inference requests per second 8 inference requests per second 8 inference requests per second

Note: If you upgrade from Essentials to Standard, you cannot revert to an Essentials plan. You must create a new plan.

For all plans:

  • Foundational Model inferencing Resource Units (RU) can be used for Prompt Lab inferencing, including input and output. That is, the prompt you enter for input is counted in addition to the generated output. (watsonx only)
  • Foundation model inferencing is available only for the Dallas, Frankfurt, and Tokyo data centers. (watsonx only)
  • Foundation model tuning in the Tuning Studio is available only for the Dallas, Frankfurt, and Tokyo data centers. (watsonx only)
  • Three model classes determine the RU rate. The price per RU differs according to model class. (watsonx only)
  • Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
  • Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
  • Watson Machine Learning places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
  • Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
  • A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.

For plan details and pricing, see IBM Cloud Machine Learning.

Resource unit metering (watsonx)

Resource Units billing is based on the rate of the billing class for the foundation model multipled by the number of Resource Units (RU). A Resource Unit is equal to 1000 tokens from the input and output of foundation model inferencing. The three foundation model billing classes have different RU rates.

Table 2. Foundation model billing details
Model Origin Billing class Price per RU
granite-13b-instruct-v2 IBM Class 1 $0.0006 per RU
granite-13b-chat-v2 IBM Class 1 $0.0006 per RU
granite-8b-japanese IBM Class 1 $0.0006 per RU
granite-20b-multilingual IBM Class 1 $0.0006 per RU
codellama-34b-instruct-hf Open source Class 2 $0.0018 per RU
elyza-japanese-llama-2-7b-instruct Open source Class 2 $0.0018 per RU
flan-t5-xl-3b Open source Class 1 $0.0006 per RU
flan-t5-xxl-11b Open source Class 2 $0.0018 per RU
flan-ul2-20b Open source Class 3 $0.0050 per RU
jais-13b-chat Open source Class 2 $0.0018 per RU
llama-2-13b-chat Open source Class 1 $0.0006 per RU
llama-2-70b-chat Open source Class 2 $0.0018 per RU
mixtral-8x7b-instruct-v01 Open source Class 1 $0.0006 per RU
mixtral-8x7b-instruct-v01-q Open source Class 1 $0.0006 per RU
mt0-xxl-13b Open source Class 2 $0.0018 per RU
starcoder-15.5b Open source Class 2 $0.0018 per RU

Capacity Unit Hours metering (watsonx and Watson Machine Learning)

CUH consumption is affected by the computational hardware resources you apply for a task as well as other factors such as the software specification and model type.

CUH consumption rates by asset type

Table 3. CUH consumption rates by asset type
Asset type Capacity type Capacity units per hour
AutoAI experiment 8 vCPU and 32 GB RAM 20
Decision Optimization training 2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
6
7
9
13
Decision Optimization deployments 2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
30
40
50
60
Machine Learning models
(training, evaluating, or scoring)
1 vCPU and 4 GB RAM
2 vCPU and 8 GB RAM
4 vCPU and 16 GB RAM
8 vCPU and 32 GB RAM
16 vCPU and 64 GB RAM
0.5
1
2
4
8
Foundation model tuning experiment
(watsonx only)
NVIDIA A100 80GB GPU 43

CUH consumption by deployment and framework type

CUH consumption for deployments is calculated using these formulas:

Table 4. CUH consumption by deployment and framework type
Deployment type Framework CUH calculation
Online AutoAI, Python functions and scripts, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny deployment_active_duration * no_of_nodes * CUH_rate_for_capacity_type_framework
Online Spark, PMML, Scikit-Learn, Pytorch, XGBoost score_duration_in_seconds * no_of_nodes * CUH_rate_for_capacity_type_framework
Batch all frameworks job_duration_in_seconds * no_of_nodes * CUH_rate_for_capacity_type_framework

Monitoring resource usage

You can track CUH or RU usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH or RU usage for an entire account.

Tracking CUH or RU usage in a project

To monitor CUH or RU consumption in a project:

  1. Navigate to the Manage tab for a project.

  2. Click Resources to review a summary of resource consumption for assets in the project or space, or to review resource consumption details for particular assets.

    Tracking resources in a project

Tracking CUH usage for an account

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner. For details, see Monitoring resources.

Tracking CUH consumption for machine learning in a notebook

To calculate capacity unit hours in a notebook, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Learn more

Parent topic: Watson Machine Learning

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more