0 / 0
Supported encoder foundation models in watsonx.ai

Supported encoder foundation models in watsonx.ai

Use encoder-only foundation models that are deployed in IBM watsonx.ai to help with semantic search, document comparison, and reranking tasks.

The following types of encoder-only foundation models are available. You can choose the type of encoder-only foundation model that best suits your needs or use both types as part of a two-step search and retrieval workflow.

Embedding foundation models

Use embedding models when you want to generate vector representations of text that you can then compare mathematically. Embedding models are faster and more efficient than reranker models, but less accurate.

For more information, see Embedding models.

Reranker foundation models

Use reranker models when you want to generate scores for each passage in a small set of passages to find the one or two that are most related to a query. Reranker models are more accurate, but less efficient than embedding models. The more inputs that you submit, the longer the reranker models take to process the text.

For more information, see Reranker models.

You can use the two types of encoder-only foundation models together for search and retrieval tasks:

  1. Use an embedding model to do a broad semantic search that returns many results.
  2. Use a reranker model to narrow the top results from step 1 to a single answer or a short list of the best answers.

For more information about generative foundation models, see Supported foundation models.

Tasks supported by encoder-only foundation models

You can use encoder-only models in watsonx.ai for the following tasks:

Vectorizing text
Specify an embedding model to use to convert text into text embeddings by using the watsonx.ai REST API. See ee Embedding models.
Reranking passages
Specify a reranker model to use to compare passages to a query and rank the passages by most-to-least related by using the watsonx.ai REST API. See Reranker models.
Vector index in Prompt Lab
Use embedding models to vectorize documents to use as grounding information that you submit to foundation model from prompts in the Prompt Lab. For more information, see Adding vectorized documents for grounding foundation model prompts.
AutoAI for RAG
Use embedding models with AutoAI to build retrieval-augmented generation workflows programmatically. For more information, see Automating a RAG pattern with the AutoAI SDK.

The following table shows the types of tasks that the encoder-only foundation models in IBM watsonx.ai support. A checkmark (✓) indicates that the task that is named in the column header is supported by the foundation model.

Table 0. Encoder-only foundation model task support
Model Vectorize text Rerank passages AutoAI for RAG Vector index in Prompt Lab
all-minilm-l6-v2
all-minilm-l12-v2
ELSER (Elastic Learned Sparse EncodeR)
ms-marco-minilm-l-12-v2
multilingual-e5-large
slate-30m-english-rtrvr-v2
slate-30m-english-rtrvr
slate-125m-english-rtrvr-v2
slate-125m-english-rtrvr

Embedding models

Embedding models are models that you use to vectorize documents and generate text embeddings to help with search and comparison tasks. A text embedding encodes the meaning of a sentence or passage in an array of numbers that are known as a vector. For more information about vectorization, see Text embedding generation.

After the passages are converted to vectors, you can calculate the similarity of the independent vectorized passages by using mathematical functions, such as cosine similarity. Most embedding models are bi-encoder models. Use a bi-encoder model when high recall is essential, meaning you don't want to miss any possible matches, and you need to check the similarity of many passages efficiently.

The following embedding models can be used from the API in watsonx.ai:

To get a list of the embedding models that are available, use the List the available foundation models method in the watsonx.ai as a service API. Specify the filters=function_embedding parameter to return only the embedding models.

curl -X GET \
  'https://{cluster_url}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'

IBM embedding models overview

The following table lists the IBM embedding models that you can use from the API.

Table 1. IBM embedding models in watsonx.ai
Model name API model_id Price (USD/1,000 tokens) Maximum input tokens Number of dimensions More information
slate-125m-english-rtrvr-v2 ibm/slate-125m-english-rtrvr-v2 $0.0001 512 768 Model card
slate-125m-english-rtrvr ibm/slate-125m-english-rtrvr $0.0001 512 768 Model card
slate-30m-english-rtrvr-v2 ibm/slate-30m-english-rtrvr-v2 $0.0001 512 384 Model card
slate-30m-english-rtrvr ibm/slate-30m-english-rtrvr $0.0001 512 384 Model card

Third-party embedding models overview

The following table lists the third-party embedding models that you can use from the API.

Table 2. Supported third-party embedding models in watsonx.ai
Model name API model_id Provider Price (USD/1,000 tokens) Maximum input tokens Number of dimensions More information
all-minilm-l6-v2 sentence-transformers/all-minilm-l6-v2 Open source natural language processing (NLP) and computer vision (CV) community $0.0001 256 384 Model card
all-minilm-l12-v2 sentence-transformers/all-minilm-l12-v2 Open source natural language processing (NLP) and computer vision (CV) community $0.0001 256 384 Model card
multilingual-e5-large intfloat/multilingual-e5-large Microsoft $0.0001 512 1,024 Model card
Research paper

 

Reranker models

Reranker models are cross-encoder models that you use to rank passages in order of most-to-least relevant to a query. Unlike bi-encoder models, cross-encoder models process a passage and query together, and generate a score for the similarity of the two inputs. The model repeats this similarity comparison step for each passage that you include. This method is a better choice when you have a smaller set of passages to score and you want to find the best answer.

The reranker models that are available from watsonx.ai cannot be used to generate text embeddings.

The following reranker model can be used from the API in watsonx.ai:

To get a list of the reranker models that are available, use the List the available foundation models method in the watsonx.ai as a service API. Specify the filters=function_rerank parameter to return only the available reranker models.

curl -X GET \
  'https://{region}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_rerank'

Reranker models overview

The following table lists the supported reranker models.

Table 3. Supported reranker models in watsonx.ai
Model name API model_id Provider Price (USD/1,000 tokens) Maximum input tokens More information
ms-marco-minilm-l-12-v2 cross-encoder/ms-marco-minilm-l-12-v2 Microsoft $0.000005 512 Model card

 

Encoder-only model details

You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported encoder-only foundation models.

all-minilm-l6-v2

The all-minilm-l6-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

The all-minilm-l6-v2 embedding model is similar to the all-minilm-l12-v2 embedding model, except that the all-minilm-l6-v2 model has six embedding layers instead of the twelve layers of the all-minilm-l12-v2 model.

Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

Number of dimensions: 384

Input token limits: 256

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L6-H384-uncased model from Microsoft that is fine-tuned on a dataset that contains 1 billion sentence pairs.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

all-minilm-l12-v2

The all-minilm-l12-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it generates a vector that captures the semantic information in the text.

The all-minilm-l12-v2 embedding model is similar to the all-minilm-l6-v2 embedding model, except that the all-minilm-l12-v2 model has twelve embedding layers instead of the six layers of the all-minilm-l6-v2 model.

Usage: Use the sentence vectors that are generated by the all-minilm-l12-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

API pricing tier: Class C1. For pricing details, see the table.

Number of dimensions: 384

Input token limits: 256

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L12-H384-uncased model from Microsoft that is fine-tuned with sentence pairs from more than 1 billion sentences.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

ms-marco-minilm-l-12-v2

The ms-marco-minilm-l-12-v2 reranker model is built by Microsoft and provided by Hugging Face. Use the model as a passage and document reranker. Given query text and a set of document passages, it ranks the list of passages from most-to-least related to the query.

Usage: Use the ms-marco-minilm-l-12-v2 reranker model when you have a small set of passages that you want to score against a query and precision is essential. For example, when you have fewer than 100 passages and you want to score them based on how similar they are to query text.

API pricing tier: Class 11. For pricing details, see the table.

Input token limits: 512

Supported natural languages: English

Fine-tuning information: The ms-marco-minilm-l-12-v2 model was trained on the MS Marco Passage Ranking task. MS MARCO (Microsoft Machine Reading Comprehension) is a large-scale dataset that is used for machine reading comprehension, question answering, and passage ranking.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

multilingual-e5-large

The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.

The embedding model architecture has 24 layers that are used sequentially to process data.

Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.

API pricing tier: Class C1. For pricing details, see the table.

Number of dimensions: 1,024

Input token limits: 512

Supported natural languages: Up to 100 languages. See the model card for details.

Fine-tuning information: This embedding model is a version of the XLM-RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5 TB of filtered CommonCrawl data. This embedding model was continually trained on a mixture of multilingual datasets.

Model architecture: Encoder-only

License: Microsoft Open Source Code of Conduct

Learn more

slate-125m-english-rtrvr

The slate-125m-english-rtrvr-v2 and slate-125m-english-rtrvr foundation models are provided by IBM. The IBM Slate 125m embedding models generate embeddings for various inputs such as queries, passages, or documents.

The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.

Usage: Two to three times slower but performs slightly better than the IBM Slate 30m embedding model.

API pricing tier: Class C1. For pricing details, see the table.

Number of dimensions: 768

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

slate-30m-english-rtrvr

The slate-30m-english-rtrvr-v2 and slate-30m-english-rtrvr foundation models are distilled versions of the slate-125m-english-rtrvr, which are all provided by IBM. The IBM Slate embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.

The embedding model architecture has 6 layers that are used sequentially to process data.

Usage: Two to three times faster and has slightly lower performance scores than the IBM Slate 125m embedding model.

API pricing tier: Class C1. For pricing details, see the table.

Try it out: Using vectorized text with retrieval-augmented generation tasks

Number of dimensions: 384

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

Parent topic: Supported foundation models

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more