You use Watson Machine Learning resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use Watson Machine Learning resources, measured in resource units (RU), when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.
Watson Machine Learning in Cloud Pak for Data as a Service and watsonx
The Watson Machine Learning plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.
For more information on watsonx.ai, see:
- Overview of IBM watsonx.ai
- Comparison of IBM watsonx and Cloud Pak for Data as a Service
- Signing up for IBM watsonx.ai
If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.
Choosing a Watson Machine Learning plan
View a comparison of plans and consider the details to choose a plan that fits your needs.
- Watson Machine Learning plans
- Capacity Unit Hours (CUH), tokens, and Resource Units (RU)
- Watson Machine Learning plan details
- Capacity Unit Hours metering
- Monitoring CUH and RU usage
Watson Machine Learning plans
Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning and for prompts you use with foundation models. Choose a plan based on your needs:
- Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities.
- Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
- Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.
For plan details and pricing, see IBM Cloud Machine Learning.
Capacity Unit Hours (CUH), tokens, and Resource Units (RU)
For metering and billing purposes, machine learning models and deployments or foundation models are measured with these units:
-
Capacity Unit Hours (CUH) measure compute resource consumption per unit hour for usage and billing purposes. CUH measures all Watson Machine Learning activity except for Foundation Model inferencing.
-
Resource Units (RU) measure foundation model inferencing consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. Choose a plan that corresponds to your usage requirements. For details on tokens, see Tokens and tokenization.
Watson Machine Learning plan details
The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.
Plan features | Lite | Essentials | Standard |
---|---|---|---|
Machine Learning usage in CUH | 20 CUH per month | CUH billing based on CUH rate multiplied by hours of consumption | 2500 CUH per month |
Foundation model inferencing in tokens or Resource Units (RU) | 50,000 tokens per month | Billed for usage (1000 tokens = 1 RU) | Billed for usage (1000 tokens = 1 RU) |
Max parallel Decision Optimization batch jobs per deployment | 2 | 5 | 100 |
Deployment jobs retained per space | 100 | 1000 | 3000 |
Deployment time to idle | 1 day | 3 days | 3 days |
HIPAA support | NA | NA | Dallas region only Must be enabled in your IBM Cloud account |
For all plans:
- Foundational Model inferencing Resource Units (RU) can be used for Prompt Lab inferencing, including input and output. That is, the prompt you enter for input is counted in addition to the generated output.
- Foundation model inferencing is available only for the Dallas data center.
- Three model classes determine the RU rate. The price per RU differs according to model class.
- Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
- Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
- Watson Machine Learning places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
- Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
For plan details and pricing, see IBM Cloud Machine Learning.
Resource unit metering
Resource Units billing is based on the rate of the billing class for the foundation model multipled by the number of Resource Units (RU). A Resource Unit is equal to 1000 tokens from the input and output of foundation model inferencing. The three foundation model billing classes have different RU rates.
Model | Origin | Billing class | Price per RU | Region available |
---|---|---|---|---|
flan-t5-xxl-11b | Open source | Class 2 | $0.0018 per RU | Dallas, Frankfurt |
flan-ul2-20b | Open source | Class 3 | $0.0050 per RU | Dallas, Frankfurt |
gpt-neox-20b | Open source | Class 3 | $0.0050 per RU | Dallas |
mpt-7b-instruct2 | Open source | Class 1 | $0.0006 per RU | Dallas, Frankfurt |
mt0-xxl-13b | Open source | Class 2 | $0.0018 per RU | Dallas |
starcoder-15.5b | Open source | Class 2 | $0.0018 per RU | Dallas |
llama-2-70b-cha | Open source | Class 3 | $0.005 per RU | Dallas |
For more information about each model, see Supported foundation models.
Capacity Unit Hours metering
CUH consumption is affected by the computational hardware resources you apply for a task as well as other factors such as the software specification and model type.
CUH consumption rates by asset type
Asset type | Capacity type | Capacity units per hour |
---|---|---|
AutoAI experiment | 8 vCPU and 32 GB RAM | 20 |
Decision Optimization training | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
6 7 9 13 |
Decision Optimization deployments | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
30 40 50 60 |
Machine Learning models (training, evaluating, or scoring) |
1 vCPU and 4 GB RAM 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
0.5 1 2 4 8 |
CUH consumption by deployment and framework type
CUH consumption for deployments is calculated using these formulas:
Deployment type | Framework | CUH calculation |
---|---|---|
Online | AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny | deployment_active_duration * no_of_nodes * CUH_rate_for_capacity_type_framework |
Online | Spark, PMML, Scikit-Learn, Pytorch, XGBoost | score_duration_in_seconds * no_of_nodes * CUH_rate_for_capacity_type_framework |
Batch | all frameworks | job_duration_in_seconds * no_of_nodes * CUH_rate_for_capacity_type_framework |
Monitoring resource usage
You can track CUH or RU usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH or RU usage for an entire account.
Tracking CUH or RU usage in a project
To monitor CUH or RU consumption in a project:
-
Navigate to the Manage tab for a project.
-
Click Resources to review a summary of resource consumption for assets in the project or space, or to review resource consumption details for particular assets.
Tracking CUH usage for an account
You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner. For details, see Monitoring resources.
Tracking CUH consumption for machine learning in a notebook
To calculate capacity unit hours in a notebook, use:
CP = client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)
For example:
'capacity_units': {'current': 19773430}
19773430/(3600*1000)
returns 5.49 CUH
For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.
Learn more
Parent topic: Watson Machine Learning