Deploying foundation models on-demand with REST API
Deploying and managing foundation models on-demand with REST API
Last updated: Feb 21, 2025
Deploying and managing foundation models on-demand with REST API
Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated models that are available for you to deploy on-demand. You can deploy foundation models on-demand
programmatically with REST API.
Before you begin
Copy link to section
You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
You can use the /ml/v4/models endpoint to create an asset for the foundation model that you want to deploy on-demand.
The following code snippet shows how to create an asset in the watsonx.ai Runtime repository for deploying your foundation model on-demand. Use the asset ID generated by this code when you deploy your model.
curl -X POST "https://<cluster url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"type": "curated_foundation_model_1.0",
"version": "1.0",
"name": "granite",
"space_id": "<Space id for deployment>",
"foundation_model": {
"model_id": "ibm/granite-13b-chat-v2-curated"
}
}'
Copy to clipboardCopied to clipboard
Creating a deployment for an on-demand foundation model
Copy link to section
You can use the /ml/v4/deployments endpoint to deploy a foundation model on-demand within your deployment space. Use the asset ID generated when you created the model asset for deployment. For more information, see
Creating model asset.
Note: Batch deployments are not supported for deploying foundation models on-demand.
The following code snippet shows how to create an online deployment to deploy your foundation model on-demand:
curl -X POST "https://<cluster url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"asset": {
"id": <Asset id created>
},
"online": {
"parameters": {
"serving_name": "llma"
}
},
"description": "<Description>,
"name": "mi",
"space_id": <Space id for deployment>
}'
Copy to clipboardCopied to clipboard
Polling for deployment status
Copy link to section
You can poll for the deployment status by using the deployment ID. When the status changes from initializing to ready, your deployment is ready to use.
The following code sample shows how to use REST API to poll for deployment status:
curl -X GET "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your project ID>" \
-H "Authorization: Bearer <replace with your token>"
Copy to clipboardCopied to clipboard
Output:
"deployed_asset_type": "curated_foundation_model"
Copy to clipboardCopied to clipboard
Testing foundation models deployed on-demand
Copy link to section
You can test a foundation model that is deployed on-demand for online inferencing.
The following code snippet shows how to test a foundation model that is deployed on-demand for online inferencing:
curl -X POST "https://<replace with your cloud hostname>/ml/v1/deployments/<replace with your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
Copy to clipboardCopied to clipboard
Managing foundation models deployed on-demand
Copy link to section
Access, update, scale, or delete foundation models that are deployed on-demand with the REST API.
Accessing the deployed model
Copy link to section
To retrieve the list of all foundation models that are deployed on-demand in a deployment space with REST API, you can set the query parameter type=curated_foundation_model.
The following code sample shows how to use the REST API to access all foundation models that are deployed on-demand in a deployment space:
curl -X GET "https://<replace with yourcloud hostname>/ml/v4/deployments?version=2024-01-29&space_id=<replace with your space ID>&type=curated_foundation_model" \
-H "Authorization: Bearer <replace with your token>"
Copy to clipboardCopied to clipboard
Updating the deployment
Copy link to section
Update the required deployment metadata for your deployment such as name, description, tags, and more.
The following code sample shows how to update the name for your foundation model that is deployed on-demand:
curl -X PATCH "https://<replace with your cloud hostname>//ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your space ID>" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '[{
"op": "replace",
"path": "/name",
"value": "<replace with updated deployment name>"
}]'
Copy to clipboardCopied to clipboard
Scaling the deployment
Copy link to section
You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.
The following code sample shows how to scale the number of replicas for your deployment:
curl -X PATCH "<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" \
-H "Authorization: Bearer $token" \
-H "content-type: application/json" \
--data '[{
"op": "replace",
"path": "/hardware_request",
"value": {"num_nodes": 2}
}]'
Copy to clipboardCopied to clipboard
Important:
If you want to scale the hardware resources, use the PATCH operation with the /hardware_request parameter and update the number of hardware nodes by providing a value for the num_nodes parameter.
You cannot use the size parameter with /hardware_request.
You cannot use the PATCH operation to update the foundation model parameters (/online/parameters/foundation_model).
Deleting the deployment
Copy link to section
You can delete your deployed foundation model when you no longer need it to stop the billing charges.
The following code sample shows how to delete a foundation model deployed on-demand with the REST API:
curl -vk -X DELETE "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" -H "Authorization: Bearer <replace with your token>"
Copy to clipboardCopied to clipboard
Sample notebook
Copy link to section
The following sample notebook demonstrates prompting for foundation models that are deployed on-demand programmatically. You must deploy your foundation model on-demand before running the notebook.
Setup Create a prompt for the Schema Linking Model Perform an inference on the Schema Linking model using the WX.AI endpoint Post the process of the Schema Linking model output Create a prompt for the SQL Geneneration
model Perform an inference on the SQL Generation model using the WX.AI endpoint
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.