A collection of open source and IBM foundation models are available in IBM watsonx.ai. You can inference the foundation models in the Prompt Lab or programmatically.
The following models are available in watsonx.ai:
- granite-13b-chat-v2
- granite-13b-instruct-v2
- granite-7b-lab
- granite-8b-japanese
- granite-20b-multilingual
- granite-3-2b-instruct
- granite-3-8b-instruct
- granite-3-8b-base
- granite-guardian-3-2b
- granite-guardian-3-8b
- granite-3b-code-instruct
- granite-8b-code-instruct
- granite-20b-code-instruct
- granite-20b-code-base-schema-linking
- granite-20b-code-base-sql-gen
- granite-34b-code-instruct
- allam-1-13b-instruct
- codellama-34b-instruct
- elyza-japanese-llama-2-7b-instruct
- flan-t5-xl-3b
- flan-t5-xxl-11b
- flan-ul2-20b
- jais-13b-chat
- llama-3-3-70b-instruct
- llama-3-2-1b-instruct
- llama-3-2-3b-instruct
- llama-3-2-11b-vision-instruct
- llama-3-2-90b-vision-instruct
- llama-guard-3-11b-vision
- llama-3-1-8b
- llama-3-1-8b-instruct
- llama-3-1-70b-instruct
- llama-3-405b-instruct
- llama-3-8b-instruct
- llama-3-70b-instruct
- llama-2-13b-chat
- llama-2-70b-chat
- mistral-large
- mistral-nemo-instruct-2407
- mixtral-8x7b-base
- mixtral-8x7b-instruct-v01
- mt0-xxl-13b
- pixtral-12b
To learn more about the various ways that these models can be deployed, and to see a summary of pricing and context window length information for the models, see Supported foundation models.
How to choose a model
To review factors that can help you to choose a model, such as supported tasks and languages, see Choosing a model and Foundation model benchmarks.
Foundation model details
The available foundation models support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts.
allam-1-13b-instruct
The allam-1-13b-instruct foundation model is a bilingual large language model for Arabic and English provided by the National Center for Artificial Intelligence and supported by the Saudi Authority for Data and Artificial Intelligence that is fine-tuned to support conversational tasks. The ALLaM series is a collection of powerful language models designed to advance Arabic language technology. These models are initialized with Llama-2 weights and undergo training on both Arabic and English languages.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
- Size
- 13 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 3.
- Availability
- Provided by IBM deployed on multitenant hardware.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- Arabic (Modern Standard Arabic) and English
- Instruction tuning information
- allam-1-13b-instruct is based on the Allam-13b-base model, which is a foundation model that is pre-trained on a total of 3 trillion tokens in English and Arabic, including the tokens seen from its initialization. The Arabic dataset contains 500 billion tokens after cleaning and deduplication. The additional data is collected from open source collections and web crawls. The allam-1-13b-instruct foundation model is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
- Model architecture
- Decoder-only
- License
- Llama 2 community license and ALLaM license
- Learn more
- Read the following resource:
codellama-34b-instruct
A programmatic code generation model that is based on Llama 2 from Meta. Code Llama is fine-tuned for generating and discussing code.
- Usage
- Use Code Llama to create prompts that generate code based on natural language inputs, explain code, or that complete and debug code.
- Size
- 34 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 3.
- Availability
- Provided by IBM deployed on multitenant hardware.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 16,384
- Note: The maximum new tokens, which means the tokens that are generated by the foundation model per request, is limited to 8,192.
- Supported natural languages
- English
- Supported programming languages
- The codellama-34b-instruct-hf foundation model supports many programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
- Instruction tuning information
- The instruction fine-tuned version was fed natural language instruction input and the expected output to guide the model to generate helpful and safe answers in natural language.
- Model architecture
- Decoder
- License
- License
- Learn more
- Read the following resources:
elyza-japanese-llama-2-7b-instruct
The elyza-japanese-llama-2-7b-instruct model is provided by ELYZA, Inc on Hugging Face. The elyza-japanese-llama-2-7b-instruct foundation model is a version of the Llama 2 model from Meta that is trained to understand and generate Japanese text. The model is fine-tuned for solving various tasks that follow user instructions and for participating in a dialog.
- Usage
- General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.
- Size
- 7 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 3.
- Availability
- Provided by IBM deployed on multitenant hardware in the Tokyo data center.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- Japanese, English
- Instruction tuning information
- For Japanese language training, Japanese text from many sources were used, including Wikipedia and the Open Super-large Crawled ALMAnaCH coRpus (a multilingual corpus that is generated by classifying and filtering language in the Common Crawl corpus). The model was fine-tuned on a dataset that was created by ELYZA. The ELYZA Tasks 100 dataset contains 100 diverse and complex tasks that were created manually and evaluated by humans. The ELYZA Tasks 100 dataset is publicly available from HuggingFace.
- Model architecture
- Decoder
- License
- License
- Learn more
- Read the following resources:
flan-t5-xl-3b
The flan-t5-xl-3b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
- Usage
- General use with zero- or few-shot prompts.
- Size
- 3 billion parameters
- API pricing tier
- Class 1. For pricing details, see Table 3 and Table 5.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Sample prompts
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- Multilingual
- Instruction tuning information
- The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training datasets used are published.
- Model architecture
- Encoder-decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
flan-t5-xxl-11b
The flan-t5-xxl-11b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
- Usage
- General use with zero- or few-shot prompts.
- Size
- 11 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 3 and Table 5.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- English, German, French
- Instruction tuning information
- The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training datasets used are published.
- Model architecture
- Encoder-decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
flan-ul2-20b
The flan-ul2-20b model is provided by Google on Hugging Face. This model was trained by using the Unifying Language Learning Paradigms (UL2). The model is optimized for language generation, language understanding, text classification, question answering, common sense reasoning, long text reasoning, structured-knowledge grounding, and information retrieval, in-context learning, zero-shot prompting, and one-shot prompting.
- Usage
- General use with zero- or few-shot prompts.
- Size
- 20 billion parameters
- API pricing tier
- Class 3. For pricing details, see Table 3 and Table 5.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Experiment with samples:
- Sample prompts
- Sample prompt: Earnings call summary
- Sample prompt: Meeting transcript summary
- Sample prompt: Scenario classification
- Sample prompt: Sentiment classification
- Sample prompt: Thank you note generation
- Sample prompt: Named entity extraction
- Sample prompt: Fact extraction
- Sample notebook: Use watsonx to summarize cybersecurity documents
- Sample notebook: Use watsonx and LangChain to answer questions by using retrieval-augmented generation (RAG)
- Sample notebook: Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)
- Sample notebook: Use watsonx, and Elasticsearch Python library to answer questions (RAG)
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- English
- Instruction tuning information
- The flan-ul2-20b model is pretrained on the colossal, cleaned version of Common Crawl's web crawl corpus. The model is fine-tuned with multiple pretraining objectives to optimize it for various natural language processing tasks. Details about the training datasets used are published.
- Model architecture
- Encoder-decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
granite-13b-chat-v2
The granite-13b-chat-v2 model is provided by IBM. This model is optimized for dialog use cases and works well with virtual agent and chat applications.
Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format. Includes a keyword in its output that can be used as a stop sequence to produce succinct answers. Follow the prompting guidelines for tips on usage. For more information, see Prompting granite-13b-chat-v2.
- Size
-
13 billion parameters
- API pricing tier
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 8,192
- Supported natural languages
-
English
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant datasets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and datasets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-13b-instruct-v2
The granite-13b-instruct-v2 model is provided by IBM. This model was trained with high-quality finance data, and is a top-performing model on finance tasks. Financial tasks evaluated include: providing sentiment scores for stock and earnings call transcripts, classifying news headlines, extracting credit risk assessments, summarizing financial long-form text, and answering financial or insurance-related questions.
- Usage
- Supports extraction, summarization, and classification tasks. Generates useful output for finance-related tasks. Uses a model-specific prompt format. Accepts special characters, which can be used for generating structured output.
- Size
- 13 billion parameters
- API pricing tier
- Class 1. For pricing details, see Table 2 and Table 4.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output): 8,192
- Supported natural languages
-
English
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant datasets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and datasets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-7b-lab
The granite-7b-lab foundation model is provided by IBM. The granite-7b-lab foundation model uses a novel alignment tuning method from IBM Research. Large-scale Alignment for chatBots, or LAB is a method for adding new skills to existing foundation models by generating synthetic data for the skills, and then using that data to tune the foundation model.
- Usage
- Supports general purpose tasks, including extraction, summarization, classification, and more. Follow the prompting guidelines for tips on usage. For more information, see Prompting granite-7b-lab.
- Size
- 7 billion parameters
- API pricing tier
- Class 1. For pricing details, see Table 2.
- Availability
- Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output): 8,192
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 4,096.
- Supported natural languages
-
English
- Instruction tuning information
-
The granite-7b-lab foundation model is trained iteratively by using the large-scale alignment for chatbots (LAB) methodology.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. When you use the granite-7b-lab foundation model that is provided in watsonx.ai the contractual protections related to IBM indemnification apply. See the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-8b-japanese
The granite-8b-japanese model is provided by IBM. The granite-8b-japanese foundation model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model and is trained to understand and generate Japanese text.
- Usage
-
Useful for general purpose tasks in the Japanese language, such as classification, extraction, question-answering, and for language translation between Japanese and English.
- Size
-
8 billion parameters
- API pricing tier
-
Class 1. For pricing details, see Table 2.
- Availability
-
Provided by IBM deployed on multitenant hardware in the Tokyo data center.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output): 4,096
- Supported natural languages
-
English, Japanese
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant datasets from five domains: internet, academic, code, legal, and finance. The granite-8b-japanese model was pretrained on 1 trillion tokens of English and 0.5 trillion tokens of Japanese text.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-20b-multilingual
A foundation model from the IBM Granite family. The granite-20b-multilingual foundation model is based on the Granite Base 20 billion base model and is trained to understand and generate text in English, German, Spanish, French, and Portuguese.
- Usage
- English, German, Spanish, French, and Portuguese closed-domain question answering, summarization, generation, extraction, and classification.
- Size
-
20 billion parameters
- API pricing tier
-
Class 1. For pricing details, see Table 2.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output): 8,192
- Supported natural languages
-
English, German, Spanish, French, and Portuguese
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant datasets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and datasets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-3-8b-base
The Granite 8b foundation model is a base model that belongs to the IBM Granite family of models. The model is trained on 10 trillion tokens that are sourced from diverse domains, and then further trained on 2 trillion tokens of high-quality data that was carefully chosen to enhance the model's performance on specific tasks.
- Usage
-
The Granite 3.0 base foundation model is a baseline model that you can customize to create specialized models for specific application scenarios.
- Available size
-
8 billion parameters
- API pricing tier
-
For pricing details, see Table 4.
- Availability
-
Deploy on demand for dedicated use.
- Token limits
-
Context window length (input + output): 4,096
- Supported natural languages
-
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified).
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
Granite Instruct models
The Granite Instruct foundation models belong to the IBM Granite family of models. The granite-3-2b-instruct and granite-3-8b-instruct foundation models are third generation instruct-tuned language models for tasks like summarization, generation, coding, and more. The foundation models employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
- Usage
-
Granite Instruct foundation models are designed to excel in instruction-following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, function-calling, and more.
- Available sizes
-
- 2 billion parameters
- 8 billion parameters
- API pricing tier
-
- 2b: Class C1
- 8b: Class 12
For pricing details, see Table 2.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output)
- 2b: 131,072
- 8b: 131,072
The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 8,192.
- Supported natural languages
-
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified).
- Supported programming languages
-
The Granite Instruct models are trained with code written in 116 programming languages.
- Instruction tuning information
-
The Granite Instruct models are fine tuned Granite Instruct base models trained on over 12 trillion tokens with a combination of permissively licensed open-source and proprietary instruction data.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
Granite Guardian models
The Granite Guardian foundation models belong to the IBM Granite family of models. The granite-guardian-3-2b and granite-guardian-3-8b foundation models are generation 3.0 fine-tuned Granite Instruct models that are designed to detect risks in prompts and responses. The foundation models help with risk detection along many key dimensions in the AI Risk Atlas.
- Usage
-
Granite Guardian foundation models are designed to detect harm-related risks within prompt text or model response (as guardrails) and can be used in retrieval-augmented generation use cases to assess context relevance (whether the retrieved context is relevant to the query), groundedness (whether the response is accurate and faithful to the provided context), and answer relevance (whether the response directly addresses the user's query).
- Available sizes
-
- 2 billion parameters
- 8 billion parameters
- API pricing tier
-
- 2b: Class C1
- 8b: Class 12
For pricing details, see Table 2.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output)
- 2b: 8,192
- 8b: 8,192
- Supported natural languages
-
English
- Instruction tuning information
-
The Granite Guardian models are fine tuned Granite Instruct models trained on a combination of human annotated and synthetic data.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
Granite Code models
Foundation models from the IBM Granite family. The Granite Code foundation models are instruction-following models fine-tuned using a combination of Git commits paired with human instructions and open-source synthetically generated code instruction datasets.
The granite-8b-code-instruct v2.0.0 foundation model can process larger prompts with an increased context window length.
- Usage
-
The following Granite Code foundation models are designed to respond to coding-related instructions and can be used to build coding assistants:
- granite-3b-code-instruct
- granite-8b-code-instruct
- granite-20b-code-instruct
- granite-34b-code-instruct
The following Granite Code foundation models are instruction-tuned versions of the granite-20b-code-base foundation model that are designed for text-to-SQL generation tasks.
- granite-20b-code-base-schema-linking
- granite-20b-code-base-sql-gen
- Available sizes
-
- 3 billion parameters
- 8 billion parameters
- 20 billion parameters
- 34 billion parameters
- API pricing tier
-
Class 1.
For pricing details for the code models, see Table 2.
For pricing details for the text-to-SQL models, see Table 4.
- Availability
-
Granite Code models: Provided by IBM deployed on multitenant hardware.
Text-to-SQL code models: Deploy on demand for dedicated use.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output)
-
granite-3b-code-instruct : 128,000
The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 8,192.
-
granite-8b-code-instruct : 128,000
The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 8,192.
-
granite-20b-code-instruct : 8,192
The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 4,096.
-
granite-20b-code-base-schema-linking : 8,192
-
granite-20b-code-base-sql-gen : 8,192
-
granite-34b-code-instruct : 8,192
-
- Supported natural languages
-
English
- Supported programming languages
-
The Granite Code foundation models support 116 programming languages including Python, Javascript, Java, C++, Go, and Rust. For the full list, see IBM foundation models.
- Instruction tuning information
-
These models were fine-tuned from Granite Code base models on a combination of permissively licensed instruction data to enhance instruction-following capabilities including logical reasoning and problem-solving skills.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
jais-13b-chat
The jais-13b-chat foundation model is a bilingual large language model for Arabic and English that is fine-tuned to support conversational tasks.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
- Size
- 13 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 3.
- Availability
- Provided by IBM deployed on multitenant hardware in the Frankfurt data center.
- Try it out
- Sample prompt: Arabic chat
- Token limits
- Context window length (input + output): 2,048
- Supported natural languages
- Arabic (Modern Standard Arabic) and English
- Instruction tuning information
- Jais-13b-chat is based on the Jais-13b model, which is a foundation model that is trained on 116 billion Arabic tokens and 279 billion English tokens. Jais-13b-chat is fine tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
- Model architecture
- Decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
Llama 3.3 70B Instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model (text in/text out) with 70 billion parameters.
The llama-3-3-70b-instruct is a revision of the popular Llama 3.1 70B Instruct foundation model. The Llama 3.3 foundation model is better at coding, step-by-step reasoning, and tool-calling. Despite its smaller size, the Llama 3.3 model's performance is similar to that of the Llama 3.1 405b model, making it a great choice for developers.
- Usage
-
Generates multilingual dialog output like a chatbot. Uses a model-specific prompt format.
- Available size
-
70 billion parameters
- API pricing tier
-
Class 2
For pricing details, see Table 3.
- Availability
-
-
A quantized version of the model is provided by IBM deployed on multitenant hardware.
-
Two versions of the model are available to deploy on demand for dedicated use:
- llama-3-3-70b-instruct-hf: Original version published on Hugging Face by Meta.
- llama-3-3-70b-instruct: A quantized version of the model that can be deployed with 2 GPUs instead of 4.
-
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output): 131,072
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Instruction tuning information
-
Llama 3.3 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.2 Instruct
The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-3-2-1b-instruct and llama-3-2-3b-instruct models are the smallest Llama 3.2 models that fit onto a mobile device. The models are lightweight, text-only models that can be used to build highly personalized, on-device agents.
For example, you can ask the models to summarize the last ten messages you received, or to summarize your schedule for the next month.
- Usage
-
Generate dialog output like a chatbot. Use a model-specific prompt format. Their small size and modest compute resource and memory requirements enable the Llama 3.2 Instruct models to be run locally on most hardware, including on mobile and other edge devices.
- Available sizes
-
- 1 billion parameters
- 3 billion parameters
- API pricing tier
-
- 1b: Class C1
- 3b: Class 8
For pricing details, see Table 3.
For pricing details, see Billing details for generative AI assets.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output)
- 1b: 131,072
- 3b: 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Instruction tuning information
-
Pretrained on up to 9 trillion tokens of data from publicly available sources. Logits from the Llama 3.1 8B and 70B models were incorporated into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. In post-training, aligned the pre-trained model by using Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.2 Vision Instruct
The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-3-2-11b-vision-instruct and llama-3-2-90b-vision-instruct models are built for image-in, text-out use cases such as document-level understanding, interpretation of charts and graphs, and captioning of images.
- Usage
-
Generates dialog output like a chatbot and can perform computer vision tasks including classification, object detection and identification, image-to-text transcription (including handwriting), contextual Q&A, data extraction and processing, image comparison and personal visual assistance. Uses a model-specific prompt format.
- Available sizes
-
- 11 billion parameters
- 90 billion parameters
- API pricing tier
-
- 11b: Class 9
- 90b: Class 10
For pricing details, see Table 3.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output)
- 11b: 131,072
- 90b: 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192. The tokens that are counted for an image that you submit to the model are not included in the context window length.
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.
- Instruction tuning information
-
Llama 3.2 Vision models use image-reasoning adaptor weights that are trained separately from the core large language model weights. This separation preserves the general knowledge of the model and makes the model more efficient both at pretraining time and run time. The Llama 3.2 Vision models were pretrained on 6 billion image-and-text pairs, which required far fewer compute resources than were needed to pretrain the Llama 3.1 70B foundation model alone. Llama 3.2 models also run efficiently because they can tap more compute resources for image reasoning only when the input requires it.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
llama-guard-3-11b-vision
The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-guard-3-11b-vision is a multimodal evolution of the text-only Llama-Guard-3 model. The model can be used to classify image and text content in user inputs (prompt classification) as safe or unsafe.
- Usage
-
Use the model to check the safety of the image and text in an image-to-text prompt.
- Size
-
- 11 billion parameters
- API pricing tier
-
Class 9. For pricing details, see Table 3.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output): 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192. The tokens that are counted for an image that you submit to the model are not included in the context window length.
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.
- Instruction tuning information
-
Pretrained model that is fine-tuned for content safety classification. For more information about the types of content that are classified as unsafe, see the model card.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.1 8b
The Meta Llama 3.1 collection of foundation models are provided by Meta. The Llama 3.1 base foundation model is a multilingual model that supports tool use, and has overall stronger reasoning capabilities.
- Usage
- Use for long-form text summarization and with multilingual conversational agents or coding assistants.
- Available size
- 8 billion parameters
- API pricing tier
- For pricing details, see Table 5.
- Availability
- Deploy on demand for dedicated use.
- Token limits
- Context window length (input + output): 131,072
- Supported natural languages
- English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Model architecture
- Decoder-only
- License
- Learn more
- Read the following resources:
Llama 3.1 Instruct
The Meta Llama 3.1 collection of foundation models are provided by Meta. The Llama 3.1 foundation models are pretrained and instruction tuned text-only generative models that are optimized for multilingual dialogue use cases. The models use supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety.
The llama-3-405b-instruct model is Meta's largest open-sourced foundation model to date. This foundation model can also be used as a synthetic data generator, post-training data ranking judge, or model teacher/supervisor that can improve specialized capabilities in more inference-friendly, derivative models.
- Usage
-
Generates dialog output like a chatbot. Uses a model-specific prompt format.
- Available sizes
-
- 8 billion parameters
- 70 billion parameters
- 405 billion parameters
- API pricing tier
-
- 8b: Class 1
- 70b: Class 2
- 405b: Class 3 (input), Class 7 (output)
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy the llama-3-1-8b-instruct foundation model on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output)
-
8b and 70b: 131,072
-
405b: 16,384
- Although the model supports a context window length of 131,072, the window is limited to 16,384 to reduce the time it takes for the model to generate a response.
-
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 4,096.
-
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Instruction tuning information
-
Llama 3.1 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3 Instruct
The Meta Llama 3 family of foundation models are accessible, open large language models that are built with Meta Llama 3 and provided by Meta on Hugging Face. The Llama 3 foundation models are instruction fine-tuned language models that can support various use cases.
- Usage
-
Generates dialog output like a chatbot.
- Available sizes
-
- 8 billion parameters
- 70 billion parameters
- API pricing tier
-
- 8b: Class 1
- 70b: Class 2
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output)
- 8b: 8,192
- 70b: 8,192
Note: The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 4,096.
- Supported natural languages
-
English
- Instruction tuning information
-
Llama 3 features improvements in post-training procedures that reduce false refusal rates, improve alignment, and increase diversity in the foundation model output. The result is better reasoning, code generation, and instruction-following capabilities. Llama 3 has more training tokens (15T) that result in better language comprehension.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 2 Chat
The Llama 2 Chat models are provided by Meta on Hugging Face. The fine-tuned models are useful for chat generation. The models are pretrained with publicly available online data and fine-tuned using reinforcement learning from human feedback.
You can choose to use the 13 billion parameter or 70 billion parameter version of the model.
- Usage
- Generates dialog output like a chatbot. Uses a model-specific prompt format.
- Size
-
- 13 billion parameters
- 70 billion parameters
- API pricing tier
- Class 1. For pricing details, see Table 3 and Table 5.
- Availability
-
- 13b
- Provided by IBM deployed on multitenant hardware
- Deploy on demand for dedicated use
- 70b
- Deploy on demand for dedicated use
- 13b
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output)
- 13b: 4,096
- 70b: 4,096
- Supported natural languages
- English
- Instruction tuning information
- Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets and more than one million new examples that were annotated by humans.
- Model architecture
- Decoder-only
- License
- License
- Learn more
- Read the following resources:
mistral-large
Mistral Large 2 is a large language model developed by Mistral AI. The mistral-large foundation model is fluent in and understands the grammar and cultural context of English, French, Spanish, German, and Italian. The foundation model can also understand dozens of other languages. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases. The mistral-large foundation model is effective at programmatic tasks, such as generating, reviewing, and commenting on code, function calling, and can generate results in JSON format.
For more getting started information, see the watsonx.ai page on the Mistral AI website.
- Usage
-
Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.
- API pricing tier
-
Pricing for the Mistral Large model is not assigned by a multiplier. The following special pricing tiers are used:
- Input tier: Mistral Large Input
- Output tier: Mistral Large
For pricing details, see Table 3.
- Try it out
- Token limits
-
Context window length (input + output): 128,000
Note:
- Although the model supports a context window length of 128,000, the window is limited to 32,768 to reduce the time it takes for the model to generate a response.
- The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.
- Supported programming languages
-
The mistral-large model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-large foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
-
Read the following resources:
mistral-nemo-instruct-2407
The mistral-nemo-instruct-2407 foundation models is a 12 billion parameter model from Mistral AI that was built in collaboration with NVIDIA. Mistral NeMo performs exceptionally well in reasoning, world knowledge, and coding accuracy, especially for a model of its size.
- Usage
- The Mistral NeMo model is multilingual and is trained on function calling.
- Size
- 12 billion parameters
- API pricing tier
- For pricing details, see Table 5.
- Availability
- Deploy on demand for dedicated use.
- Token limits
- Context window length (input + output): 131,072
- Supported natural languages
- Multiple languages and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Supported programming languages
- The Mistral NeMo model has been trained on several programming languages.
- Instruction tuning information
- Mistral NeMo had an advanced fine-tuning and alignment phase.
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
mixtral-8x7b-base
The mixtral-8x7b-base foundation model is provided by Mistral AI. The mixtral-8x7b-base foundation model is a generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation.
- Size
-
46.7 billion parameters
- API pricing tier
-
For pricing details, see Table 5.
- Availability
-
Deploy on demand for dedicated use.
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mixtral-8x7b-instruct-v01
The mixtral-8x7b-instruct-v01 foundation model is provided by Mistral AI. The mixtral-8x7b-instruct-v01 foundation model is a pretrained generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.
- Size
-
46.7 billion parameters
- API pricing tier
-
Class 1. For pricing details, see Table 3.
- Try it out
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Instruction tuning information
-
The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mt0-xxl-13b
The mt0-xxl-13b model is provided by BigScience on Hugging Face. The model is optimized to support language generation and translation tasks with English, languages other than English, and multilingual prompts.
Usage: General use with zero- or few-shot prompts. For translation tasks, include a period to indicate the end of the text you want translated or the model might continue the sentence rather than translate it.
- Size
- 13 billion parameters
- API pricing tier
- Class 2. For pricing details, see Table 5.
- Availability
-
- Deployed on demand for dedicated use.
- Try it out
- Experiment with the following samples:
- Supported natural languages
- Multilingual
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- The model is pretrained on multilingual data in 108 languages and fine-tuned with multilingual data in 46 languages to perform multilingual tasks.
- Instruction tuning information
- BigScience publishes details about its code and datasets.
- Model architecture
- Encoder-decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
pixtral-12b
Pixtral 12B is a multimodal model developed by Mistral Al. The pixtral-12b foundation model is trained to understand both natural images and documents and is able to ingest images at their natural resolution and aspect ratio, providing flexibility on the number of tokens used to process an image. The foundation model supports multiple images in its long context window. The model is effective in image-in, text-out multimodal tasks and excels at instruction following.
- Usage
- Chart and figure understanding, document question answering, multimodal reasoning, and instruction following
- Size
- 12 billion parameters
- API pricing tier
- Class 9. For pricing details, see Table 3.
Availability
- Try it out
- Token limits
-
Context window length (input + output): 128,000
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.
- Supported natural languages
-
English
- Instruction tuning information
-
The pixtral-12b model is trained with interleaved image and text data and is based on the Mistral Nemo model with a 400 million parameter vision encoder trained from scratch.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Any deprecated foundation models are highlighted with a deprecated warning icon . For more information about deprecation, including foundation model withdrawal dates, see Foundation model lifecycle.
Learn more
Parent topic: Developing generative AI solutions