Watson Machine Learning plans and compute usage

You use Watson Machine Learning resources, measured in capacity unit hours, when you train AutoAI models,run deep learning experiments, and request predictions from deployed models. This topic describes the various plans you can choose, and what services are included, and provides a list of default computing environments to help you select a plan that matches your needs. This topic describes:

Watson Machine Learning plans

Watson Machine Learning plans govern how you are billed for models you train and deploy with Watson Machine Learning. Choose a plan based on your needs:

  • Lite is a free plan with limited capacity. Choose this plan if you are evaluating Watson Machine Learning and want to try out the capabilities. Note that HIPAA support is not available with the Standard plan
  • Standard is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs. Note that HIPAA support are not available with the Standard plan
  • Professional is a high-capacity, flat-rate enterprise plan designed to support all of an organization’s machine learning needs.

This table provides details for plan allowances and restrictions.

Feature Lite Standard Professional
Max published models 200 1000 1000
Deployed models 5 1000 1000
Predictions 5000 per month Billed per prediction 2 million then billed per 1,000
Capacity Unit Hours 50 per month Billed per CUH 1,000 then billed for additional CUH
HIPAA readiness     Available if provisioned on IBM Cloud - Dallas region
Decision Optimization
AutoAI Experiments
Batch scoring
Deep learning training Max 8 k80 GPU in parallel Unlimited Unlimited

 

Watson Machine Learning compute usage and pricing

Note: For complete details on pricing, see Watson Machine Learning: Pricing

Machine Learning compute usage is calculated by the number of capacity unit hours (CUH) consumed by an active machine learning instance.

The rate of capacity units per hour consumed is determined by the computing requirements of your Machine Learning assets and models. For example, a model with a large, complex data set will consume more training resources than a model with a smaller, simpler data set.

An Event is an occurrence of a specific event that is processed by or related to the use of the Cloud Service. For the purpose of this offering, an Event is a prediction. Multiple predictions can be executed from a single API call and each individual prediction is considered an Event.

Compute time is calculated to the millisecond. However, there is a one-minute minimum for each distinct operation. That is, a training run that takes 12 seconds is billed as one minute toward the capacity unit hour quota, while a training run that takes 83.555 seconds is billed exactly as calculated.

These tables show the capacity units per hour calculation for machine learning environments, by usage type.

 

Capacity units per hour for deep learning experiments

Capacity type Capacity units per hour
1 (one) NVIDIA K80 GPU 2
1 (one) NVIDIA V100 GPU 8

 

Capacity units per hour for batch scoring

Capacity type Capacity units per hour
Extra small: 1x4 = 1 vCPU and 4 GB RAM 0.5
Small: 2x8 = 2 vCPU and 8 GB RAM 1
Medium: 4x16 = 4 vCPU and 16 GB RAM 2
Large: 8x32 = 8 vCPU and 32 GB RAM 4
Extra large: 16x64 = 16 vCPU and 64 GB RAM 8

 

Capacity units per hour for AutoAI experiments

Capacity type Capacity units per hour
AutoAI: 8 vCPU and 32 GB RAM 20

 

Capacity units per hour for Decision Optimization

Capacity type Capacity units per hour
Decision Optimization: 2 vCPU and 8 GB RAM 30
Decision Optimization: 4 vCPU and 16 GB RAM 40
Decision Optimization: 16 vCPU and 64 GB RAM 60

For details on how resources are consumed, see Monitoring account resource usage

How predictions are calculated for online deployments

For online predictions, charges are based on the total number of scoring records for which predictions are performed by the associated instance rather than being based on the number of scoring API calls made.

This example uses a published notebook tutorial notebook Use Spark and Python to Predict Equipment Purchase to explain how predictions are billed.

Scenario-1: User sends one scoring API request to predict outcome for one record (for example, a single set of features)

In this example, the user is trying to predict what product a 23-year old will be interested in buying. The scoring payload for this scenario looks like this.

payload_scoring = { 
'fields': ['GENDER','AGE','MARITAL_STATUS','PROFESSION'],
'values': [  
                  ['M',23,'Single','Student']
              ]
}

Note that in the above payload, the values contain only 1 record (feature set).The scoring API will return 1 prediction record for the 1 input record in the payload. The scoring output will look like this:

{
	"fields": ["GENDER",
		"AGE",
		"MARITAL_STATUS",
		"PROFESSION",
		"PRODUCT_LINE",
		"label",
		"PROFESSION_IX",
		"GENDER_IX",
		"MARITAL_STATUS_IX",
		"features",
		"rawPrediction",
		"probability",
		"prediction",
		"predictedLabel"
	],
	"values": [
		["M",
			23,
			"Single",
			"Student",
			"Camping Equipment",
			0.0,
			6.0,
			0.0,
			1.0,
			[0.0, 23.0, 1.0, 6.0],
			[5.570605067417983,
				6.7285830309330175,
				5.782009212142643,
				0.1766529669798611,
				1.742149722526497
			],
			[0.2785302533708991,
				0.3364291515466508,
				0.2891004606071321,
				0.008832648348993053,
				0.08710748612632484
			],
			1.0,
			"Personal Accessories"
		]
	]
}

For this scenario, because there is only 1 input record for which a prediction is made, the total prediction count which is accounted for billing will be incremented by 1.

  • A Lite plan user is only charged for the prediction if they exceed the threshold of 5,000 free predictions.

  • A Standard plan user is charged per prediction at the rates described in the rate plan.

Scenario-2: User sends one scoring API request to predict outcome for 2 records (2 sets of features)

In this example, the user is trying to predict what products two customers will be interested in buying. The scoring payload for this scenario will look like this:

payload_scoring = { 
'fields': ['GENDER','AGE','MARITAL_STATUS','PROFESSION'],
'values': [  
                  ['M',23,'Single','Student'],
                  ['M',55,'Single','Executive']
              ]
}

Note that in this payload the values contains 2 records (feature sets). The scoring API will return 2 prediction records corresponding to the 2 input records in the payload. The scoring output will look like this:

{
	"fields": ["GENDER",
		"AGE",
		"MARITAL_STATUS",
		"PROFESSION",
		"PRODUCT_LINE",
		"label",
		"PROFESSION_IX",
		"GENDER_IX",
		"MARITAL_STATUS_IX",
		"features",
		"rawPrediction",
		"probability",
		"prediction",
		"predictedLabel"
	],
	"values": [
		["M",
			23,
			"Single",
			"Student",
			"Camping Equipment",
			0.0,
			6.0,
			0.0,
			1.0,
			[0.0, 23.0, 1.0, 6.0],
			[5.570605067417983,
				6.7285830309330175,
				5.782009212142643,
				0.1766529669798611,
				1.742149722526497
			],
			[0.2785302533708991,
				0.3364291515466508,
				0.2891004606071321,
				0.008832648348993053,
				0.08710748612632484
			],
			1.0,
			"Personal Accessories"
		],
		["M",
			55,
			"Single",
			"Executive",
			"Camping Equipment",
			0.0,
			3.0,
			0.0,
			1.0,
			[0.0, 55.0, 1.0, 3.0],
			[2.632879457632312,
				4.479278937861745,
				2.7938862335667167,
				10.010685179962001,
				0.08327019097722486
			],
			[0.1316439728816156,
				0.22396394689308724,
				0.13969431167833585,
				0.5005342589981001,
				0.004163509548861243
			],
			3.0,
			"Golf Equipment"
		]
	]
}

In this scenario, because there are 2 input records for which predictions are made, the total prediction count which is accounted for billing will be incremented by 2.

Calculating prediction charges for Deep Learning models

Deep Learning models built with TensorFlow, Keras, or Caffe are billed in a way that is similar to online to online scoring. The charges are based on the number of scoring records for which predictions are made - that is, the cumulative number of predictions made across all submitted batch jobs submitted by the user.

Tracking runtime usage

You can track runtime usage by project, in a notebook, or across accounts.

Track runtime usage for machine learning by project

You can view the machine learning environment runtimes that are currently active in a project, and monitor usage for your machine learning assets from the project Environments page.

Track runtime usage for machine learning in a notebook

To view the CUH consumed by a model from a notebook, use the Watson Machine Learning API call GET /v3/wml_instances/{instance_id} to get information back about the service instance.

To calculate capacity unit hours, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Track runtime usage for an account

The CUH consumed by the service runtimes in a project are billed to the account that the project creator has selected in his or her profile settings at the time the project is created. This account can be the account of the project creator, or another account that the project creator has access to. If other users are added to the project and use runtimes, their usage is also billed against the account that the project creator chose at the time of project creation.

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the Watson Machine Learning service owner.

To view the total runtime usage across all of the projects and see how much of your plan you have currently used, choose Manage > Environment Runtimes.

A list of the active runtimes billed to your account is displayed. You can see who created the runtimes, when, and for which service instances, as well as the capacity units that were consumed by the active runtimes at the time you view the list.