Supported encoder foundation models in watsonx.ai

Last updated: Apr 04, 2025

Use encoder-only foundation models that are deployed in IBM watsonx.ai to help with semantic search, document comparison, and reranking tasks.

You can choose the type of encoder-only foundation model that best suits your needs or use both types as part of a two-step search and retrieval workflow. You can use the following types of encoder-only foundation models in watsonx.ai:

Embedding foundation models

Use embedding models when you want to generate vector representations of text that you can then compare mathematically. Embedding models are faster and more efficient than reranker models, but less accurate.

For more information, see Embedding models.

Reranker foundation models

Use reranker models when you want to generate scores for each passage in a small set of passages to find the one or two that are most related to a query. Reranker models are more accurate, but less efficient than embedding models. The more inputs that you submit, the longer the reranker models take to process the text.

For more information, see Reranker models.

To use the two types of encoder-only foundation models together for search and retrieval tasks, you can do the following:

Use an embedding model to do a broad semantic search that returns many results.
Use a reranker model to narrow the top results from step 1 to a single answer or a short list of the best answers.

For more information about generative foundation models, see Supported foundation models. Unlike generative models, you cannot add embedding models as custom foundation models.

Tasks supported by encoder-only foundation models

You can use encoder-only models in watsonx.ai for the following tasks:

Vectorizing text from API: Specify an embedding model to use to convert text into text embeddings by using the watsonx.ai REST API. See Vectorizing text.
Reranking passages from API: Specify a reranker model to use to compare passages to a query and rank the passages by most-to-least related by using the watsonx.ai REST API. See Reranking document passages.
AutoAI for RAG: Use embedding models with AutoAI to build retrieval-augmented generation workflows programmatically. See Automating a RAG pattern with the AutoAI SDK.
Chat with documents in Prompt Lab: Use embedding models to vectorize documents to use as grounding information that you submit to foundation model from prompts in the Prompt Lab. See Adding vectorized documents for grounding foundation model prompts.

The following table shows the types of tasks that the encoder-only foundation models in IBM watsonx.ai support. A checkmark (✓) indicates that the task that is named in the column header is supported by the foundation model.

Table 1. Encoder-only foundation model task support
Model	Vectorize text from API	Rerank passages from API	AutoAI for RAG	Chat with documents in Prompt Lab
all-minilm-l6-v2	✓			✓
all-minilm-l12-v2	✓			✓
ELSER (Elastic Learned Sparse EncodeR)				✓
granite-embedding-107m-multilingual	✓			✓
granite-embedding-278m-multilingual	✓		✓	✓
ms-marco-minilm-l-12-v2		✓
multilingual-e5-large	✓		✓	✓
slate-30m-english-rtrvr-v2	✓			✓
slate-30m-english-rtrvr	✓			✓
slate-125m-english-rtrvr-v2	✓		✓	✓
slate-125m-english-rtrvr	✓		✓	✓

Embedding models

Embedding models are models that you use to vectorize documents and generate text embeddings to help with search and comparison tasks. A text embedding encodes the meaning of a sentence or passage in an array of numbers that are known as a vector. For more information about vectorization, see Text embedding generation.

After the passages are converted to vectors, you can calculate the similarity of the independent vectorized passages by using mathematical functions, such as cosine similarity. Most embedding models are bi-encoder models. Use a bi-encoder model when high recall is essential, meaning you don't want to miss any possible matches, and you need to check the similarity of many passages efficiently.

The following embedding models can be used from the API in watsonx.ai:

granite-embedding-107m-multilingual
granite-embedding-278m-multilingual
slate-30m-english-rtrvr-v2
slate-30m-english-rtrvr
slate-125m-english-rtrvr-v2
slate-125m-english-rtrvr
all-minilm-l6-v2
all-minilm-l12-v2
multilingual-e5-large

To get a list of the embedding models that are available, use the List the available foundation models method in the watsonx.ai as a service API. Specify the filters=function_embedding parameter to return only the embedding models.

curl -X GET \
  'https://{cluster_url}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'

Natural Language Processing capabilities

IBM Slate models also power a set of libraries that you can use for common natural language processing (NLP) tasks, such as classification, entity extraction, sentiment analysis, and more.

For more information about how to use the NLP capabilities of the Slate models, see Watson NLP library.

IBM embedding models overview

The following table lists the IBM embedding models that you can use from the API.

Table 2. IBM embedding models in watsonx.ai
Model name	API model ID	Price (USD/1,000 tokens)	Maximum input tokens	Number of dimensions	More information
granite-embedding-107m-multilingual	`ibm/granite-embedding-107m-multilingual`	$0.0001	512	384	Model card
granite-embedding-278m-multilingual	`ibm/granite-embedding-278m-multilingual`	$0.0001	512	768	Model card
slate-125m-english-rtrvr-v2	`ibm/slate-125m-english-rtrvr-v2`	$0.0001	512	768	Model card
slate-125m-english-rtrvr	`ibm/slate-125m-english-rtrvr`	$0.0001	512	768	Model card
slate-30m-english-rtrvr-v2	`ibm/slate-30m-english-rtrvr-v2`	$0.0001	512	384	Model card
slate-30m-english-rtrvr	`ibm/slate-30m-english-rtrvr`	$0.0001	512	384	Model card

Third-party embedding models overview

The following table lists the third-party embedding models that you can use from the API.

Table 3. Third-party embedding models in watsonx.ai
Model name	API model ID	Provider	Price (USD/1,000 tokens)	Maximum input tokens	Number of dimensions	More information
all-minilm-l6-v2	`sentence-transformers/all-minilm-l6-v2`	Open source natural language processing (NLP) and computer vision (CV) community	$0.0001	256	384	• Model card
all-minilm-l12-v2	`sentence-transformers/all-minilm-l12-v2`	Open source natural language processing (NLP) and computer vision (CV) community	$0.0001	256	384	• Model card
multilingual-e5-large	`intfloat/multilingual-e5-large`	Microsoft	$0.0001	512	1,024	• Model card • Research paper

For a list of which models are provided in each regional data center, see Regional availability of foundation models.
Model use is measured in Resource Units (RU). Each unit is equal to 1,000 tokens from the input that is submitted to the foundation model. For more information, see Billing details for generative AI assets.

Reranker models

Reranker models are cross-encoder models that you use to rank passages in order of most-to-least relevant to a query. Unlike bi-encoder models, cross-encoder models process a passage and query together, and generate a score for the similarity of the two inputs. The model repeats this similarity comparison step for each passage that you include. This method is a better choice when you have a smaller set of passages to score and you want to find the best answer.

The reranker models that are available from watsonx.ai cannot be used to generate text embeddings.

The following reranker model can be used from the API in watsonx.ai:

ms-marco-minilm-l-12-v2

To get a list of the reranker models that are available, use the List the available foundation models method in the watsonx.ai as a service API. Specify the filters=function_rerank parameter to return only the available reranker models.

curl -X GET \
  'https://{region}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_rerank'

Reranker models overview

The following table lists the reranker models you can use in watsonx.ai:

Table 4. Reranker models in watsonx.ai
Model name	API model ID	Provider	Price (USD/1,000 tokens)	Maximum input tokens	More information
ms-marco-minilm-l-12-v2	`cross-encoder/ms-marco-minilm-l-12-v2`	Microsoft	$0.000005	512	• Model card

For a list of which models are provided in each regional data center, see Regional availability of foundation models.
Model use is measured in Resource Units (RU). Each unit is equal to 1,000 tokens from the input that is submitted to the foundation model. For more information, see Billing details for generative AI assets.

Encoder-only model details

You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported encoder-only foundation models.

all-minilm-l6-v2

The all-minilm-l6-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

The all-minilm-l6-v2 embedding model is similar to the all-minilm-l12-v2 embedding model, except that the all-minilm-l6-v2 model has six embedding layers instead of the twelve layers of the all-minilm-l12-v2 model.

Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

API pricing tier: Class C1. For pricing details, see Table 3.

Number of dimensions: 384

Input token limits: 128

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L6-H384-uncased model from Microsoft that is fine-tuned on a dataset that contains 1 billion sentence pairs.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

all-minilm-l12-v2

The all-minilm-l12-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it generates a vector that captures the semantic information in the text.

The all-minilm-l12-v2 embedding model is similar to the all-minilm-l6-v2 embedding model, except that the all-minilm-l12-v2 model has twelve embedding layers instead of the six layers of the all-minilm-l6-v2 model.

Usage: Use the sentence vectors that are generated by the all-minilm-l12-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

API pricing tier: Class C1. For pricing details, see Table 3.

Number of dimensions: 384

Input token limits: 128

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L12-H384-uncased model from Microsoft that is fine-tuned with sentence pairs from more than 1 billion sentences.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

granite-embedding-107m-multilingual

The granite-embedding-107m-multilingual model is a 107 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings. The model is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly licenses, and datasets that are generated and collected by IBM. It supports 12 languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.

Usage: Use the granite-embedding-107m-multilingual model to produce an embedding for a given input like a query, passage, or document. The model is trained to maximize the cosine similarity between two input pieces of text.

API pricing tier: Class C1. For pricing details, see Table 2.

Number of dimensions: 384

Input token limits: 512

Supported natural languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese

Fine-tuning information: The granite-embedding-107m-multilingual model is a version of the XLM RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5 TB of filtered CommonCrawl data. The model was continually trained on a mixture of multilingual datasets for retrieval-based tasks.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

granite-embedding-278m-multilingual

The granite-embedding-278m-multilingual model is a 278 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings. The model is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly licenses, and datasets that are generated and collected by IBM. It supports 12 languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.

Usage: Use the granite-embedding-278m-multilingual model to produce an embedding for a given input like a query, passage, or document. The model is trained to maximize the cosine similarity between two input pieces of text.

API pricing tier: Class C1. For pricing details, see Table 2.

Number of dimensions: 768

Input token limits: 512

Supported natural languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

ms-marco-minilm-l-12-v2

The ms-marco-minilm-l-12-v2 reranker model is built by Microsoft and provided by Hugging Face. Use the model as a passage and document reranker. Given query text and a set of document passages, it ranks the list of passages from most-to-least related to the query.

Usage: Use the ms-marco-minilm-l-12-v2 reranker model when you have a small set of passages that you want to score against a query and precision is essential. For example, when you have fewer than 100 passages and you want to score them based on how similar they are to query text.

API pricing tier: Class 11. For pricing details, see Table 4.

Input token limits: 512

Supported natural languages: English

Fine-tuning information: The ms-marco-minilm-l-12-v2 model was trained on the MS Marco Passage Ranking task. MS MARCO (Microsoft Machine Reading Comprehension) is a large-scale dataset that is used for machine reading comprehension, question answering, and passage ranking.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

multilingual-e5-large

The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.

The embedding model architecture has 24 layers that are used sequentially to process data.

Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.

API pricing tier: Class C1. For pricing details, see Table 3.

Number of dimensions: 1,024

Input token limits: 512

Supported natural languages: Up to 100 languages. See the model card for details.

Fine-tuning information: The multilingual-e5-large model is a version of the XLM RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5 TB of filtered CommonCrawl data. The model was continually trained on a mixture of multilingual datasets.

Model architecture: Encoder-only

License: Microsoft Open Source Code of Conduct

Learn more

slate-125m-english-rtrvr

The slate-125m-english-rtrvr-v2 and slate-125m-english-rtrvr foundation models are provided by IBM. The IBM Slate 125m embedding models generate embeddings for various inputs such as queries, passages, or documents.

The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.

Usage: Two to three times slower but performs slightly better than the IBM Slate 30m embedding model.

API pricing tier: Class C1. For pricing details, see Table 2.

Number of dimensions: 768

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

slate-30m-english-rtrvr

The slate-30m-english-rtrvr-v2 and slate-30m-english-rtrvr foundation models are distilled versions of the slate-125m-english-rtrvr, which are all provided by IBM. The IBM Slate embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.

The embedding model architecture has 6 layers that are used sequentially to process data.

Usage: Two to three times faster and has slightly lower performance scores than the IBM Slate 125m embedding model.

API pricing tier: Class C1. For pricing details, see Table 2.

Try it out: Using vectorized text with retrieval-augmented generation tasks

Number of dimensions: 384

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

Learn more

For more information about using IBM embedding models to convert sentences and passages into text embeddings, see Text embedding generation.
Adding vectorized documents
Reranking passages by using the API

Parent topic: Supported foundation models

Was the topic helpful?

0/1000

Tasks supported by encoder-only foundation modelsCopy link to section

Embedding modelsCopy link to section

Natural Language Processing capabilitiesCopy link to section

IBM embedding models overviewCopy link to section

Third-party embedding models overviewCopy link to section

Reranker modelsCopy link to section

Reranker models overviewCopy link to section

Encoder-only model detailsCopy link to section

all-minilm-l6-v2Copy link to section

all-minilm-l12-v2Copy link to section

granite-embedding-107m-multilingualCopy link to section

granite-embedding-278m-multilingualCopy link to section

ms-marco-minilm-l-12-v2Copy link to section

multilingual-e5-largeCopy link to section

slate-125m-english-rtrvrCopy link to section

slate-30m-english-rtrvrCopy link to section

Learn moreCopy link to section

Tasks supported by encoder-only foundation models

Embedding models

Natural Language Processing capabilities

IBM embedding models overview

Third-party embedding models overview

Reranker models

Reranker models overview

Encoder-only model details

all-minilm-l6-v2

all-minilm-l12-v2

granite-embedding-107m-multilingual

granite-embedding-278m-multilingual

ms-marco-minilm-l-12-v2

multilingual-e5-large

slate-125m-english-rtrvr

slate-30m-english-rtrvr

Learn more