Supported embedding models available with watsonx.ai
Use embedding models that are deployed in IBM watsonx.ai to help with semantic search and document comparison tasks.
Embedding models are encoder-only foundation models that create text embeddings. A text embedding encodes the meaning of a sentence or passage in an array of numbers known as a vector. For more information, see Text embedding generation.
The following embedding models are available in watsonx.ai:
- slate-30m-english-rtrvr
- slate-125m-english-rtrvr
- all-minilm-l12-v2
- bge-large-en-v1.5
- multilingual-e5-large
For more information about generative foundation models, see Supported foundation models.
IBM embedding models
The following table lists the supported embedding models that IBM provides.
Model name | API model_id | Billing class | Maximum input tokens | Number of dimensions | More information |
---|---|---|---|---|---|
slate-125m-english-rtrvr | ibm/slate-125m-english-rtrvr |
Class C1 | 512 | 768 | Model card |
slate-30m-english-rtrvr | ibm/slate-30m-english-rtrvr |
Class C1 | 512 | 384 | Model card |
Third-party embedding models
The following table lists the supported third-party embedding models.
Model name | API model_id | Provider | Billing class | Maximum input tokens | Number of dimensions | More information |
---|---|---|---|---|---|---|
all-minilm-l12-v2 | sentence-transformers/all-minilm-l12-v2 | Open source natural language processing (NLP) and computer vision (CV) community | Class C1 | 256 | 384 | • Model card |
bge-large-en-v1.5 | baai/bge-large-en-v1 | Beijing Academy of AI | Class C1 | 256 | 1024 | • Model card |
multilingual-e5-large | intfloat/multilingual-e5-large | Microsoft | Class C1 | 256 | 1024 | • Model card • Research paper |
- For a list of which models are provided in each regional data center, see Regional availability of foundation models.
- For information about billing classes, see Watson Machine Learning plans.
Embedding model details
You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported embedding models.
all-minilm-l12-v2
The all-minilm-l12-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information in the text.
Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.
Cost: Class C1. For pricing details, see Watson Machine Learning plans.
Number of dimensions: 384
Input token limits: 256
Supported natural languages: English
Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L12-H384-uncased model from Microsoft that is fine-tuned with sentence pairs from more than 1 billion sentences.
Model architecture: Encoder-only
License: Apache 2.0 license
Learn more
bge-large-en-v1.5
The bge-large-en-v1.5 embedding model is built by the Beijing Academy of AI (BAAI) and provided by Hugging Face.
Usage: The English version of the BAAI general embedding (bge) model is designed to convert English sentences and passages into text embeddings.
Cost: Class C1. For pricing details, see Watson Machine Learning plans.
Number of dimensions: 1024
Input token limits: 256
Supported natural languages: English
Fine-tuning information: The bge-large-en-v1.5 model was trained on large-scale pair data by using contrastive learning to address common issues with similarity distribution and enhance its ability to retrieve text when no instructions are provided.
Model architecture: Encoder-only
License: MIT License
Learn more
multilingual-e5-large
The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.
The embedding model architecture has 24 layers that are used sequentially to process data.
Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. When you submit input to the model, follow these guidelines:
- Prefix the inputs with
query:
andpassage:
respectively for tasks such as passage or information retrieval. - Prefix the input text with
query:
for tasks such as semantic similarity, bitext mining, and paraphrase retrieval. - Prefix the input text with
query:
if you want to use embeddings as features, such as in linear probing classification or for clustering.
Cost: Class C1. For pricing details, see Watson Machine Learning plans.
Number of dimensions: 1024
Input token limits: 256
Supported natural languages: Up to 100 languages. See the model card for details.
Fine-tuning information: This embedding model is a version of the XLM-RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5TB of filtered CommonCrawl data. This embedding model was continually trained on a mixture of multilingual datasets.
Model architecture: Encoder-only
License: Microsoft Open Source Code of Conduct
Learn more
slate-125m-english-rtrvr
The slate-125m-english-rtrvr foundation model is provided by IBM. The slate-125m-english-rtrvr foundation model generates embeddings for various inputs such as queries, passages, or documents. The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.
Usage: Two to three times slower but performs slightly better than the slate-30m-english-rtrvr model.
Cost: Class C1. For pricing details, see Watson Machine Learning plans.
Number of dimensions: 768
Input token limits: 512
Supported natural languages: English
Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.
Model architecture: Encoder-only
License: Terms of use
Learn more
slate-30m-english-rtrvr
The slate-30m-english-rtrvr foundation model is a distilled version of the slate-125m-english-rtrvr, which are both provided by IBM. The slate-30m-english-rtrvr embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.
The embedding model architecture has 6 layers that are used sequentially to process data.
Usage: Two to three times faster and has slightly lower performance scores than the slate-125m-english-rtrvr model.
Cost: Class C1. For pricing details, see Watson Machine Learning plans.
Try it out: Using text embeddings to ground prompts in factual information
Number of dimensions: 384
Input token limits: 512
Supported natural languages: English
Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.
Model architecture: Encoder-only
License: Terms of use
Learn more
Parent topic: Text embedding generation