Supported embedding models available with watsonx.ai

Use embedding models that are deployed in IBM watsonx.ai to help with semantic search and document comparison tasks.

Embedding models are encoder-only foundation models that create text embeddings. A text embedding encodes the meaning of a sentence or passage in an array of numbers known as a vector. For more information, see Text embedding generation.

The following embedding models are available in watsonx.ai:

slate-30m-english-rtrvr
slate-125m-english-rtrvr
all-minilm-l12-v2
bge-large-en-v1.5
multilingual-e5-large

For more information about generative foundation models, see Supported foundation models.

IBM embedding models

The following table lists the supported embedding models that IBM provides.

Table 1. IBM embedding model in watsonx.ai
Model name	API model_id	Billing class	Maximum input tokens	Number of dimensions	More information
slate-125m-english-rtrvr	`ibm/slate-125m-english-rtrvr`	Class C1	512	768	Model card
slate-30m-english-rtrvr	`ibm/slate-30m-english-rtrvr`	Class C1	512	384	Model card

Third-party embedding models

The following table lists the supported third-party embedding models.

Table 2. Supported third-party embedding model in watsonx.ai
Model name	API model_id	Provider	Billing class	Maximum input tokens	Number of dimensions	More information
all-minilm-l12-v2	sentence-transformers/all-minilm-l12-v2	Open source natural language processing (NLP) and computer vision (CV) community	Class C1	256	384	• Model card
bge-large-en-v1.5	baai/bge-large-en-v1	Beijing Academy of AI	Class C1	256	1024	• Model card
multilingual-e5-large	intfloat/multilingual-e5-large	Microsoft	Class C1	256	1024	• Model card • Research paper

For a list of which models are provided in each regional data center, see Regional availability of foundation models.
For information about billing classes, see Watson Machine Learning plans.

Embedding model details

You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported embedding models.

all-minilm-l12-v2

The all-minilm-l12-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information in the text.

Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

Cost: Class C1. For pricing details, see Watson Machine Learning plans.

Number of dimensions: 384

Input token limits: 256

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L12-H384-uncased model from Microsoft that is fine-tuned with sentence pairs from more than 1 billion sentences.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

Model card

bge-large-en-v1.5

The bge-large-en-v1.5 embedding model is built by the Beijing Academy of AI (BAAI) and provided by Hugging Face.

Usage: The English version of the BAAI general embedding (bge) model is designed to convert English sentences and passages into text embeddings.

Cost: Class C1. For pricing details, see Watson Machine Learning plans.

Number of dimensions: 1024

Input token limits: 256

Supported natural languages: English

Fine-tuning information: The bge-large-en-v1.5 model was trained on large-scale pair data by using contrastive learning to address common issues with similarity distribution and enhance its ability to retrieve text when no instructions are provided.

Model architecture: Encoder-only

License: MIT License

Learn more

Model card

multilingual-e5-large

The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.

The embedding model architecture has 24 layers that are used sequentially to process data.

Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. When you submit input to the model, follow these guidelines:

Prefix the inputs with query: and passage: respectively for tasks such as passage or information retrieval.
Prefix the input text with query: for tasks such as semantic similarity, bitext mining, and paraphrase retrieval.
Prefix the input text with query: if you want to use embeddings as features, such as in linear probing classification or for clustering.

Cost: Class C1. For pricing details, see Watson Machine Learning plans.

Number of dimensions: 1024

Input token limits: 256

Supported natural languages: Up to 100 languages. See the model card for details.

Fine-tuning information: This embedding model is a version of the XLM-RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5TB of filtered CommonCrawl data. This embedding model was continually trained on a mixture of multilingual datasets.

Model architecture: Encoder-only

License: Microsoft Open Source Code of Conduct

Learn more

slate-125m-english-rtrvr

The slate-125m-english-rtrvr foundation model is provided by IBM. The slate-125m-english-rtrvr foundation model generates embeddings for various inputs such as queries, passages, or documents. The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.

Usage: Two to three times slower but performs slightly better than the slate-30m-english-rtrvr model.

Cost: Class C1. For pricing details, see Watson Machine Learning plans.

Number of dimensions: 768

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

Model card

slate-30m-english-rtrvr

The slate-30m-english-rtrvr foundation model is a distilled version of the slate-125m-english-rtrvr, which are both provided by IBM. The slate-30m-english-rtrvr embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.

The embedding model architecture has 6 layers that are used sequentially to process data.

Usage: Two to three times faster and has slightly lower performance scores than the slate-125m-english-rtrvr model.

Cost: Class C1. For pricing details, see Watson Machine Learning plans.

Try it out: Using text embeddings to ground prompts in factual information

Number of dimensions: 384

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

Model card

Parent topic: Text embedding generation