Creating a deployment for a custom foundation model

Last updated: Feb 21, 2025

After a custom foundation model asset is created, you can create a deployment for the model to make it available for inferencing.

Prerequisites

You must set up your task credentials by generating an API key. For more information, see Managing task credentials.
Before deploying your model, review the Available hardware specifications and pick a predefined hardware specification that matches your model.
Additionally, review whether the available software specifications match your model architecture. For details, see Supported model architectures.

Creating a deployment from the watsonx.ai user interface

Follow these steps to create a deployment for a custom foundation model:

In your deployment space or your project, go to the Assets tab.
Find your model in the asset list, click the Menu icon , and select Deploy.
Enter a name for your deployment and optionally enter a serving name, description, and tags.
Note:
- Use the Serving name field to specify a name for your deployment instead of deployment ID.
- The serving name must be unique within the namespace.
- The serving name must contain only these characters: [a-z,0-9,_] and must be a maximum 36 characters long.
- In workflows where your custom foundation model is used periodically, consider assigning your model the same serving name each time you deploy it. This way, after you delete and then re-deploy the model, you can keep using the same endpoint in your code.
Select a configuration and a software specification for your model.
Optional: If you want to override some of the base model parameters, click Model deployment parameters and then enter new parameter values:
- Data type: Choose the float16 or bfloat16 to specify the data type for your model.
- Max batch size: Enter the maximum batch size for your model.
- Max concurrent requests: Enter the maximum number of concurrent requests that can be made to your model.
- Max new tokens: Enter the maximum number of tokens that can be created for your model for an inference request.
- Max sequence length: Enter the maximum sequence length for your model.
Click Create.

Note:

If you use the watsonx-cfm-caikit-1.1 software specification to deploy your model, the value of the max_concurrent_requests parameter is not used.

Testing the deployment

Follow these steps to test your custom foundation model deployment:

In your deployment space or your project, open the Deployments tab and click the deployment name.
Click the Test tab to input prompt text and get a response from the deployed asset.
Enter test data in one of the following formats, depending on the type of asset that you deployed:
- Text: Enter text input data to generate a block of text as output.
- Stream: Enter text input data to generate a stream of text as output.
- JSON: Enter JSON input data to generate output in JSON format.
Click Generate to get results that are based on your prompt.

Retrieving the endpoint for custom foundation model deployments

Follow these steps to retrieve the endpoint URL for your custom foundation model deployment. You need this URL to access the deployment from your applications:

In your deployment space or your project, open the Deployments tab and click the deployment name.
In the API Reference tab, find the private and public endpoint links and code snippets that you can use to include the endpoint details in an application.

Note:

If you added Serving name when you created your online deployment, you see two endpoint URLs. The first URL contains the deployment ID, and the second URL contains your serving name. You can use either one of these URLs with your deployment.

Creating a deployment programmatically

To use the watsonx.ai API, you need a bearer token. For more information, see Credentials for programmatic access.

Note:

You can override the default values of your custom foundation model parameters in the online.parameters.foundation_model field.
If you use the watsonx-cfm-caikit-1.1 software specification to deploy your model, the max_concurrent_requests parameter is not used.
Use the Serving name field to specify a name for your deployment instead of deployment ID.
The serving name must be unique within the namespace.
The serving name must contain only these characters: [a-z,0-9,_] and must be a maximum 36 characters long.
In workflows where your custom foundation model is used periodically, consider assigning your model the same serving name each time you deploy it. This way, after you delete and then re-deploy the model, you can keep using the same endpoint in your code.

To deploy a custom foundation model programmatically:

Initiate model deployment. See this code for an example deployment to space:

curl -X POST "https://<your cloud hostname>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{
  "asset":{
    "id":<your custom foundation model asset id>
  },
  "online":{
    "parameters":{
      "serving_name":"test_custom_fm",
      "foundation_model": {
          "max_sequence_length": 4096
      }
    }
  },
  "hardware_request": {
    "size": "<configuration size>",
    "num_nodes": 1
  },
  "description": "Testing deployment using custom foundation model",
  "name":"custom_fm_deployment",
  "space_id":<your space id>
}'

The size parameter can be gpu_xs, gpu_s, gpu_m, or gpu_l.
For project deployments, instead of space_id use project_id.

The deployment ID is returned in the API response, in the metadata.id field.

Use the deployment ID to poll for the deployment status. See this code for an example of how to poll for the status of a model that is deployed to a project.
```
curl -X GET "https://<your cloud hostname>/ml/v4/deployments/<your deployment ID>?version=2024-01-29&project_id=<your project ID>" \
-H "Authorization: Bearer $TOKEN"
```
The deployed_asset_type is returned as custom_foundation_model. Wait until the status changes from initializing to ready.

Next steps

Prompting a custom foundation model

Parent topic: Deploying custom foundation models

Was the topic helpful?

0/1000