A collection of open source and IBM foundation models are available for inferencing in IBM watsonx.ai. Find foundation models that best suit the needs of your generative AI application and your budget.
The foundation models that are available for inferencing from watsonx.ai are hosted in various ways:
Foundation models provided with watsonx.ai
IBM-curated foundation models that are deployed on multitenant hardware by IBM and are available for inferencing. You pay by tokens used. See Foundation models provided with watsonx.ai.
Deploy on demand foundation models
An instance of an IBM-curated foundation model that you deploy and that is dedicated for your inferencing use. Only colleagues who are granted access to the deployment can inference the foundation model. A dedicated deployment means faster and
more responsive interactions without rate limits. You pay for hosting the foundation model by the hour. See Deploy on demand foundation models.
Custom foundation models
Foundation models curated by you that you import and deploy in watsonx.ai. The instance of the custom foundation model that you deploy is dedicated for your use. A dedicated deployment means faster and more responsive interactions. You pay for
hosting the foundation model by the hour. See Custom foundation models.
Prompt-tuned foundation models
A subset of the available foundation models that can be customized for your needs by prompt tuning the model from the API or Tuning Studio. A prompt-tuned foundation model relies on the underlying IBM-deployed foundation model. You pay for the
resources that you consume to tune the model. After the model is tuned, you pay by tokens used to inference the model. See Prompt-tuned foundation models.
Various foundation models are available from watsonx.ai that you can either use immediately or that you can deploy on dedicated hardware for use by your organization.
Table 1a. Available foundation models by deployment method
A collection of open source and IBM foundation models are deployed in IBM watsonx.ai. You can prompt these foundation models in the Prompt Lab or programmatically.
The following provided foundation models are deployed by IBM for inferencing in watsonx.ai:
granite-13b-instruct-v2
granite-8b-japanese
granite-20b-multilingual Deprecated
granite-3-2b-instruct
granite-3-8b-instruct
granite-3-2-8b-instruct-preview-rc
granite-guardian-3-2b
granite-guardian-3-8b
granite-3b-code-instruct
granite-8b-code-instruct
granite-20b-code-instruct
granite-34b-code-instruct
granite-ttm-512-96-r2
granite-ttm-1024-96-r2
granite-ttm-1536-96-r2
allam-1-13b-instruct
codellama-34b-instruct Deprecated
elyza-japanese-llama-2-7b-instruct
flan-t5-xl-3b
flan-t5-xxl-11b
flan-ul2-20b
jais-13b-chat
llama-3-3-70b-instruct
llama-3-2-1b-instruct
llama-3-2-3b-instruct
llama-3-2-11b-vision-instruct
llama-3-2-90b-vision-instruct
llama-guard-3-11b-vision-instruct
llama-3-1-8b-instruct Deprecated
llama-3-1-70b-instruct Deprecated
llama-3-405b-instruct
llama-3-70b-instruct (London and Sydney regions only) Deprecated
llama-2-13b-chat Deprecated
mistral-large
mistral-small-24b-instruct-2501
mixtral-8x7b-instruct-v01
pixtral-12b
To start inferencing a provided foundation model, complete these steps:
From the main menu, select Resource hub.
Click View all in the Pay per token section.
Click a foundation model tile, and then click Open in Prompt Lab.
IBM foundation models provided with watsonx.ai
Copy link to section
The following table lists the supported IBM foundation models that IBM provides for inferencing.
Use is measured in Resource Units (RU); each unit is equal to 1,000 tokens from the input and output of foundation model inferencing. For details on how model pricing is calculated and monitored, see Billing details for generative AI assets.
Some IBM foundation models are also available from third-party repositories, such as Hugging Face. IBM foundation models that you obtain from a third-party repository are not indemnified by IBM. Only IBM foundation models that you access from
watsonx.ai are indemnified by IBM. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Table 2. IBM foundation models provided with watsonx.ai for inferencing
Third-party foundation models provided with watsonx.ai
Copy link to section
The following table lists the supported third-party foundation models that are provided with watsonx.ai.
Use is measured in Resource Units (RU); each unit is equal to 1,000 tokens from the input and output of foundation model inferencing. For details on how model pricing is calculated and monitored, see Billing details for generative AI assets.
Table 3. Third-party foundation models provided with watsonx.ai
In addition to working with foundation models that are curated by IBM, you can upload and deploy your own foundation models. After the custom models are deployed and registered with watsonx.ai, you can create prompts that inference the custom
models from the Prompt Lab and from the watsonx.ai API.
The following table lists the IBM foundation models that are available for you to deploy on demand.
Some IBM foundation models are also available from third-party repositories, such as Hugging Face. IBM foundation models that you obtain from a third-party repository are not indemnified by IBM. Only IBM foundation models that you access from
watsonx.ai are indemnified by IBM. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Table 4. IBM foundation models available to deploy on demand in watsonx.ai
Note:
There is an hourly access fee associated with hosting the mistral-large-instruct-2411 and mistral-large-instruct-2407 foundation models from Mistral AI for dedicated use. The total price for hosting these deploy
on demand foundation models is the sum of the access price plus the hosting price.