Decision Optimization uses watsonx.ai Runtime asynchronous APIs to enable jobs to be run in parallel.
To solve a problem, you can create a new job from the model deployment and associate data to it. See Deployment steps and the REST API example. You are not charged for deploying a model. Only the solving of a model with some data is charged, based on the running time.
To solve more than one job at a time, specify more than one node when you create your
deployment. For example in this REST API example, increment the number of the nodes by changing the value of the
nodes property: "nodes" : 1
.
PODs (nodes)
When a job is created and submitted, how it is handled depends on the current configuration and jobs that are running for the watsonx.ai Runtime instance. This process is shown in the following diagram.
- The new job is sent to the queue.
- If a POD is started but idle (not running a job), it immediately begins processing this job.
- Otherwise, if the maximum number of nodes is not reached, a new POD is started. (Starting a POD can take a few seconds). The job is then assigned to this new POD for processing.
- Otherwise, the job waits in the queue until one of the running PODs has finished and can pick up the waiting job.
The configuration of PODs of each size is as follows:
Definition | Name | Description |
---|---|---|
2 vCPU and 8 GB | S | Small |
4 vCPU and 16 GB | M | Medium |
8 vCPU and 32 GB | L | Large |
16 vCPU and 64 GB | XL | Extra Large |
For all configurations, 1 vCPU and 512 MB are reserved for internal use.
In addition to the solve time, the pricing depends on the selected size through a multiplier.
In the deployment configuration, you can also set the maximal number of nodes to be used.
Idle PODs are automatically stopped after some timeout. If a new job is submitted when no PODs are up, it takes some time (approximately 30 seconds) for the POD to restart.
Run-time-based pricing (CUH)
Only the job solve time is charged: the idle time for PODs is not charged.
Depending on the size of the POD used, a different multiplier is used to compute the number of Capacity Units Hours (CUH) used.
REST API example
For the full procedure of deploying a model and links to the Swagger documentation, see REST API example.
Python API example
In addition to the REST APIs, a Python API is provided with the watsonx.ai Runtime so that you can easily create, deploy, and use a Decision Optimization model from a Python notebook.
For more information, see Python client example.
An example notebook describing and documenting all steps is available from the Resource hub.