Watson Machine Learning plans and runtime usage

Watson Machine Learning plans and compute usage

You use Watson Machine Learning resources, measured in capacity unit hours (CUH), when you train AutoAI models, run deep learning experiments, and request predictions from deployed models. This topic describes the various plans you can choose, and what services are included, and provides a list of default computing environments to help you select a plan that matches your needs. This topic describes:

Watson Machine Learning plans
Watson Machine Learning compute usage and pricing
Tracking runtime usage

New plan announcement

As part of a broader Watson Machine Learning update coming in August, Lite plan users will be automatically upgraded to new “v2” plan instances on September 1, 2020. With this automatic upgrade, Lite plan users can use the new v4 Watson Machine Learning APIs to support new features for building and deploying assets. With the Standard plan, you pay according to capacity unit hours consumed. Additionally, Lite plan users can continue to communicate with both the v3 and v4 beta Watson Machine Learning APIs until October 1, 2020.

Note: During the migration period, you will continue to be billed only for v1 access while you evaluate the new v2 service instance plans and features.

Predictions and Capacity Unit Hours (CUH)

Predictions are when a customer makes an API call to a model that is listening online for requests. Each observation is one prediction. Charge per prediction is being deprecated in favor of charge by CUH.
CUH is a measure of compute resource consumption per unit hour
The new plan instances are consolidated around a single metric, Capacity Unit Hours, for all Watson Machine Learning plan instances, making resource consumption easier to track.
New plan instances do not restrict the number of deployments you can create.

Watson Machine Learning plans

Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning. Choose a plan based on your needs:

Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities.
Standard is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
Professional is a high-capacity, flat-rate enterprise plan designed to support all of an organization’s machine learning needs. The Professional plan also offers support for HIPAA if provisioned in the Dallas region.

Plans based on CUH

These plans will go into effect on September 1, 2020.

Plan	Capacity Unit Hours included	Overage
Lite	20 per month	0
Standard	Billed per CUH	Billed per CUH
Professional	2500	Billed for extra CUH consumed

Legacy plans (deprecated)

These existing plans bill on a combination of predictions and Capacity Unit Hours (CUH). They will be phased out, starting on September 1, 2020.

Feature	Lite	Standard	Professional
Max published models	200	1000	1000
Deployed models	5	1000	1000
Predictions	5000 per month	Billed per prediction	2 million then billed per 1,000
Capacity Unit Hours	50 per month	Billed per CUH	1,000 then billed for additional CUH HIPAA readiness available if provisioned on IBM Cloud - Dallas region

Watson Machine Learning compute usage and pricing

Note: For complete details on pricing, see Watson Machine Learning: Pricing

Machine Learning compute usage is calculated by the number of capacity unit hours (CUH) consumed by an active machine learning instance.

The rate of capacity units per hour consumed is determined by the computing requirements of your Machine Learning assets and models. For example, a model with a large, complex data set will consume more training resources than a model with a smaller, simpler data set.

Compute time is calculated to the millisecond. However, there is a one-minute minimum for each distinct operation. That is, a training run that takes 12 seconds is billed as one minute toward the capacity unit hour quota, while a training run that takes 83.555 seconds is billed exactly as calculated.

These tables show the capacity units per hour calculation for machine learning environments, by usage type.

Capacity units per hour for deep learning experiments

Capacity type	Capacity units per hour
1 (one) NVIDIA K80 GPU	2
1 (one) NVIDIA V100 GPU	8

Capacity units per hour for batch scoring

Capacity type	Capacity units per hour
Extra small: 1x4 = 1 vCPU and 4 GB RAM	0.5
Small: 2x8 = 2 vCPU and 8 GB RAM	1
Medium: 4x16 = 4 vCPU and 16 GB RAM	2
Large: 8x32 = 8 vCPU and 32 GB RAM	4
Extra large: 16x64 = 16 vCPU and 64 GB RAM	8

Capacity units per hour for AutoAI experiments

Capacity type	Capacity units per hour
AutoAI: 8 vCPU and 32 GB RAM	20

Capacity units per hour for Decision Optimization

Capacity type	Capacity units per hour
Decision Optimization: 2 vCPU and 8 GB RAM	30
Decision Optimization: 4 vCPU and 16 GB RAM	40
Decision Optimization: 16 vCPU and 64 GB RAM	60

For details on how resources are consumed, see Monitoring account resource usage

Tracking runtime usage

You can track runtime usage by project, in a notebook, or across accounts.

Track runtime usage for machine learning by project

You can view the machine learning environment runtimes that are currently active in a project or space, and monitor usage for your machine learning assets from the project or space Environments page.

Track runtime usage for machine learning in a notebook

To calculate capacity unit hours, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Track runtime usage for an account

The CUH consumed by the service runtimes in a project are billed to the account that the creator has selected in his or her profile settings at the time the project is created. This account can be the account of the creator, or another account that the creator has access to. If other users are added to the project and use runtimes, their usage is also billed against the account that the creator chose at the time of project creation.

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner.

To view the total runtime usage across all of the projects and see how much of your plan you have currently used, choose Administer > Environment Runtimes.

A list of the active runtimes billed to your account is displayed. You can see who created the runtimes, when, and for which service instances, as well as the capacity units that were consumed by the active runtimes at the time you view the list.