When you create an online deployment for a model or function from a deployment space or programmatically, a single copy of the asset is deployed by default. To increase scalability and availability, you can increase the number of copies (replicas)
by editing the configuration of the deployment. More copies allow for a larger volume of scoring requests.
Deployments can be scaled in the following ways:
Update the configuration for a deployment in a deployment space.
Programmatically, using the watsonx.ai Runtime Python client library, or the watsonx.ai Runtime REST APIs.
Before you begin
Copy link to section
You must set up your task credentials by generating an API key. For more information, see Managing task credentials.
Changing the number of copies of an online deployment from a space
Copy link to section
Click the Deployment tab of your deployment space.
From the action menu for your deployment name, click Edit.
In the Edit deployment dialog box, change the number of copies and click Save.
Increasing the number of replicas of a deployment programmatically
Copy link to section
To view or run a working sample of scaling a deployment programmatically, you can increase the number of replicas in the metadata for a deployment.
Python example
Copy link to section
This example uses the Python client to set the number of replicas to 3.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.