Decision Optimization running jobs

Running jobs

Last updated: Nov 21, 2024

Decision Optimization uses watsonx.ai Runtime asynchronous APIs to enable jobs to be run in parallel.

To solve a problem, you can create a new job from the model deployment and associate data to it. See Deployment steps and the REST API example. You are not charged for deploying a model. Only the solving of a model with some data is charged, based on the running time.

To solve more than one job at a time, specify more than one node when you create your deployment. For example in this REST API example, increment the number of the nodes by changing the value of the nodes property: "nodes" : 1.

PODs (nodes)

When a job is created and submitted, how it is handled depends on the current configuration and jobs that are running for the watsonx.ai Runtime instance. This process is shown in the following diagram.

Job workflor showing job queue, existing pod and new pod.

The new job is sent to the queue.
If a POD is started but idle (not running a job), it immediately begins processing this job.
Otherwise, if the maximum number of nodes is not reached, a new POD is started. (Starting a POD can take a few seconds). The job is then assigned to this new POD for processing.
Otherwise, the job waits in the queue until one of the running PODs has finished and can pick up the waiting job.

The configuration of PODs of each size is as follows:

Table 1. T-shirt sizes for Decision Optimization
Definition	Name	Description
2 vCPU and 8 GB	S	Small
4 vCPU and 16 GB	M	Medium
8 vCPU and 32 GB	L	Large
16 vCPU and 64 GB	XL	Extra Large

For all configurations, 1 vCPU and 512 MB are reserved for internal use.

In addition to the solve time, the pricing depends on the selected size through a multiplier.

In the deployment configuration, you can also set the maximal number of nodes to be used.

Idle PODs are automatically stopped after some timeout. If a new job is submitted when no PODs are up, it takes some time (approximately 30 seconds) for the POD to restart.

Run-time-based pricing (CUH)

Only the job solve time is charged: the idle time for PODs is not charged.

Depending on the size of the POD used, a different multiplier is used to compute the number of Capacity Units Hours (CUH) used.

REST API example

For the full procedure of deploying a model and links to the Swagger documentation, see REST API example.

Python API example

In addition to the REST APIs, a Python API is provided with the watsonx.ai Runtime so that you can easily create, deploy, and use a Decision Optimization model from a Python notebook.

For more information, see Python client example.

An example notebook describing and documenting all steps is available from the Resource hub.