IBM watsonx.ai makes a curated collection of foundation models available for you to deploy on-demand on dedicated hardware for the exclusive use of your organization. By using this approach, you can access the capabilities of these powerful foundation models without the need for extensive computational resources. Foundation models that you deploy on-demand are hosted in a dedicated deployment space where you can use these models for inferencing.
Supported foundation models
Foundation models that are available for you to deploy on-demand are hosted by IBM and billed based on Pay by the hour billing rate. These models are single-tenant models. Therefore, the deployment for these models is exclusive to you and not shared with other users for inferencing. You are charged an hourly rate as long as the deployment is active. For more information, see Hourly billing rates for deploy on demand models.
For a list of the models that are available to deploy on demand, along with descriptions and billing rates, see Supported foundation models.
When a foundation model that is deployed on-demand is deprecated, you can still continue to use the model until you delete your deployment.
Supported deployment types
You can only create online deployments for foundation models that are deployed on-demand. Batch deployments are not supported.
Considerations for deploying foundation models on-demand
You can deploy only one instance of a foundation model that can be deployed on-demand model in a deployment space. If you need more resources for your model, you can add more copies of your deployed model asset by scaling it.
Limitation and restrictions
Due to high demand for foundation models that are deployed on-demand and limited resources to accommodate it, watsonx.ai has a deployment limit of four small models, two medium models, or one large model per IBM Cloud account.
The following restrictions apply to foundation models that are deployed on-demand:
- You cannot tune a foundation model that is deployed on-demand.
- You can prompt a foundation model that is deployed on-demand and save it as a prompt template. However, you cannot deploy a saved prompt template for the foundation model that is deployed on demand. If your model uses Retrieval augmented generation (RAG), you can deploy your model as an AI service.
- You cannot use watsonx.governance to evaluate or track a prompt template for a foundation model that is deployed on-demand.
Next steps
Choose a method for deploying a foundation model on demand:
- To deploy foundation models on demand from the Resource hub with a few simple steps, see Deploying foundation models on-demand from Resource hub.
- To deploy foundation models programmatically, see Deploying foundation models on-demand with REST API.
Parent topic: Deploying generative AI assets