Deploy foundation models on-demand programmatically with REST API. Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated models that are available for you to deploy on-demand.
Before you begin
- You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
- Review requirements and considerations for deploying a foundation model on-demand.
Creating a model asset
You can use the /ml/v4/models
endpoint to create an asset for the foundation model that you want to deploy on-demand.
The following code snippet shows how to create an asset in the watsonx.ai Runtime repository for deploying your foundation model on-demand. Use the asset ID generated by this code when you deploy your model.
curl -X POST "https://<cluster url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"type": "curated_foundation_model_1.0",
"version": "1.0",
"name": "granite",
"space_id": "<Space id for deployment>",
"foundation_model": {
"model_id": "ibm/granite-13b-chat-v2-curated"
}
}'
Creating a deployment for an on-demand foundation model
You can use the /ml/v4/deployments
endpoint to deploy a foundation model on-demand in a deployment space. You must use the asset ID generated when you created the model asset for deployment. For more information, see Creating model asset.
The following code snippet shows how to create an online deployment to deploy your foundation model on-demand:
curl -X POST "https://<cluster url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"asset": {
"id": <Asset id created>
},
"online": {
"parameters": {
"serving_name": "llma"
}
},
"description": "<Description>,
"name": "mi",
"space_id": <Space id for deployment>
}'
Polling for deployment status
You can poll for the deployment status by using the deployment ID. When the status changes from initializing to ready, your deployment is ready to use.
The following code sample shows how to use REST API to poll for deployment status:
curl -X GET "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your project ID>" \
-H "Authorization: Bearer <replace with your token>"
Output:
"deployed_asset_type": "curated_foundation_model"
Testing foundation models deployed on-demand
You can test a foundation model that is deployed on-demand for online inferencing.
The following code snippet shows how to test a foundation model that is deployed on-demand for online inferencing:
curl -X POST "https://<replace with your cloud hostname>/ml/v1/deployments/<replace with your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
Managing foundation models deployed on-demand
Access, update, scale, or delete your foundation model that are deployed on-demand with the REST API.
Accessing the deployed model
To retrieve the list of all foundation models that are deployed on-demand in a deployment space with REST API, you can set the query parameter type=curated_foundation_model
.
The following code sample shows how to use the REST API to access all foundation models that are deployed on-demand in a deployment space:
curl -X GET "https://<replace with yourcloud hostname>/ml/v4/deployments?version=2024-01-29&space_id=<replace with your space ID>&type=curated_foundation_model" \
-H "Authorization: Bearer <replace with your token>"
Updating the deployment
Update the required deployment metadata for your deployment such as name, description, tags, and more.
The following code sample shows how to update the name for your foundation model that is deployed on-demand:
curl -X PATCH "https://<replace with your cloud hostname>//ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&project_id=<replace with your space ID>" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '[{
"op": "replace",
"path": "/name",
"value": "<replace with updated deployment name>"
}]'
Scaling the deployment
You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.
The following code sample shows how to scale the number of replicas for your deployment:
curl -X PATCH "<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" \
-H "Authorization: Bearer $token" \
-H "content-type: application/json" \
--data '[{
"op": "replace",
"path": "/hardware_request",
"value": {"num_nodes": 2}
}]'
- If you want to scale the hardware resources, use the
PATCH
operation with the/hardware_request
parameter and update the number of hardware nodes by providing a value for thenum_nodes
parameter. You cannot use thesize
parameter with/hardware_request
. - You cannot use the
PATCH
operation to update the foundation model parameters (/online/parameters/foundation_model
).
Deleting the deployment
You can delete your deployed foundation model when you no longer need it to stop the billing charges.
The following code sample shows how to delete a foundation model deployed on-demand with the REST API:
curl -vk -X DELETE "https://<replace with your cloud hostname>/ml/v4/deployments/<replace with your deployment ID>?version=2024-01-29&space_id=<replace with your space ID>" -H "Authorization: Bearer <replace with your token>"
Sample notebook
The following sample notebook demonstrates prompting for foundation models that are deployed on-demand programmatically. You must deploy your foundation model on-demand before running the notebook.
Notebook | Description |
---|---|
Inferencing with Granite Text-to-SQL Models | Setup Create a prompt for the Schema Linking Model Perform an inference on the Schema Linking model using the WX.AI endpoint Post the process of the Schema Linking model output Create a prompt for the SQL Geneneration model Perform an inference on the SQL Generation model using the WX.AI endpoint |
Learn more
- Supported foundation models
- Prompt Lab
- Deploying foundation models on-demand from Resource hub.
- Hourly billing rates for deploy on-demand models
Parent topic: Deploying dedicated foundation models