Deploying and managing foundation models on-demand with REST API

Last updated: Feb 21, 2025

Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated models that are available for you to deploy on-demand. You can deploy foundation models on-demand programmatically with REST API.

Before you begin

You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
Review requirements and considerations for deploying a foundation model on-demand.

Creating a model asset

You can use the /ml/v4/models endpoint to create an asset for the foundation model that you want to deploy on-demand.

The following code snippet shows how to create an asset in the watsonx.ai Runtime repository for deploying your foundation model on-demand. Use the asset ID generated by this code when you deploy your model.

curl -X POST "https://<cluster url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
    "type": "curated_foundation_model_1.0",
    "version": "1.0",
    "name": "granite",
    "space_id": "<Space id for deployment>",
    "foundation_model": {
        "model_id": "ibm/granite-13b-chat-v2-curated"
    }
}'

Creating a deployment for an on-demand foundation model

You can use the /ml/v4/deployments endpoint to deploy a foundation model on-demand within your deployment space. Use the asset ID generated when you created the model asset for deployment. For more information, see Creating model asset.

Note: Batch deployments are not supported for deploying foundation models on-demand.

The following code snippet shows how to create an online deployment to deploy your foundation model on-demand:

curl -X POST "https://<cluster url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
  "asset": {
    "id": <Asset id created>
  },
  "online": {
    "parameters": {
      "serving_name": "llma"
    }
  },
  "description": "<Description>,
  "name": "mi",
  "space_id": <Space id for deployment>
}'

Polling for deployment status

You can poll for the deployment status by using the deployment ID. When the status changes from initializing to ready, your deployment is ready to use.

The following code sample shows how to use REST API to poll for deployment status:

curl -X GET "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your project ID>" \
-H "Authorization: Bearer <replace with your token>"

Output:

"deployed_asset_type": "curated_foundation_model"

Testing foundation models deployed on-demand

You can test a foundation model that is deployed on-demand for online inferencing.

The following code snippet shows how to test a foundation model that is deployed on-demand for online inferencing:

curl -X POST "https://<replace with your cloud hostname>/ml/v1/deployments/<replace with your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
 "input": "Hello, what is your name",
 "parameters": {
    "max_new_tokens": 200,
    "min_new_tokens": 20
 }
}'

Managing foundation models deployed on-demand

Access, update, scale, or delete foundation models that are deployed on-demand with the REST API.

Accessing the deployed model

To retrieve the list of all foundation models that are deployed on-demand in a deployment space with REST API, you can set the query parameter type=curated_foundation_model.

The following code sample shows how to use the REST API to access all foundation models that are deployed on-demand in a deployment space:

curl -X GET "https://<replace with yourcloud hostname>/ml/v4/deployments?version=2024-01-29&space_id=<replace with your space ID>&type=curated_foundation_model" \
-H "Authorization: Bearer <replace with your token>"

Updating the deployment

Update the required deployment metadata for your deployment such as name, description, tags, and more.

The following code sample shows how to update the name for your foundation model that is deployed on-demand:

curl -X PATCH "https://<replace with your cloud hostname>//ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your space ID>" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '[{
 "op": "replace",
 "path": "/name",
 "value": "<replace with updated deployment name>"
}]'

Scaling the deployment

You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.

The following code sample shows how to scale the number of replicas for your deployment:

curl -X PATCH "<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" \
-H "Authorization: Bearer $token" \
-H "content-type: application/json" \
--data '[{
 "op": "replace",
 "path": "/hardware_request",
 "value": {"num_nodes": 2}                
}]'

Important:

If you want to scale the hardware resources, use the PATCH operation with the /hardware_request parameter and update the number of hardware nodes by providing a value for the num_nodes parameter. You cannot use the size parameter with /hardware_request.
You cannot use the PATCH operation to update the foundation model parameters (/online/parameters/foundation_model).

Deleting the deployment

You can delete your deployed foundation model when you no longer need it to stop the billing charges.

The following code sample shows how to delete a foundation model deployed on-demand with the REST API:

curl -vk -X DELETE "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" -H "Authorization: Bearer <replace with your token>"

Sample notebook

The following sample notebook demonstrates prompting for foundation models that are deployed on-demand programmatically. You must deploy your foundation model on-demand before running the notebook.

Sample notebook
Notebook	Description
Inferencing with Granite Text-to-SQL Models	Setup Create a prompt for the Schema Linking Model Perform an inference on the Schema Linking model using the WX.AI endpoint Post the process of the Schema Linking model output Create a prompt for the SQL Geneneration model Perform an inference on the SQL Generation model using the WX.AI endpoint

Learn more

Parent topic: Deploying dedicated foundation models

Was the topic helpful?

0/1000