Vectorizing text by using the API

Last updated: Nov 27, 2024

Use the embedding models and embeddings API that are available from watsonx.ai to create text embeddings that capture the meaning of sentences or passages for use in your generative AI applications.

Converting text into text embeddings, or vectorizing text, helps with document comparison, question-answering, and in retrieval-augmented generation (RAG) tasks, where you need to retrieve relevant content quickly.

For more information, see the following topics:

You can also use IBM embedding models from third-party platforms, such as:

What are text embeddings?

A text embedding is a numerical representation of a sentence or passage as a vector of real-valued numbers. By converting sentences to number vectors, operations on sentences become more like math equations, which is something computers can do quickly, and can do well.

When an embedding model creates a vector representation of a sentence, the embedding model assigns values that capture the semantic meaning of the sentence. The embedding model also positions the vector within a multidimensional space based on its assigned values. The size of the dimensional space varies by model, which means the exact vector values vary also. However, all models position the vectors such that sentences with similar meanings are nearer to one another.

Most embedding models generate vectors in so many dimensions, ranging from hundreds to thousands of dimensions, that it's impossible to visualize. If an embedding model were to generate a 3-dimensional vector, it might look as follows. Note that the vector values shown in the image are fictional, but are included to help illustrate this hypothetical scenario.

A 3-dimensional cube with three data points that represent three sentence embeddings

The image shows that sentences with shared keywords and with shared subjects have vectors with similar values, which places them nearer to each other within the three-dimensional space. The following sentences are positioned based on their vector values:

The Degas reproduction is hanging in the den
Jan bought a painting of dogs playing cards
I took my dogs for a walk

The first two sentences about artwork and last two sentences that share the keyword dogs are nearer to one another than the first and third sentences, which share no common words or meanings.

You can store generated vectors in a vector database. When the same embedding model is used to convert all of the sentences in the database, the vector store can leverage the inherent groupings and relationships that exist among the sentences based on their vector values to return relevant search results quickly.

Unlike traditional indexes that store text and rely on keyword search for information retrieval, vector stores support semantic searches that retrieve information that is similar in meaning. For example, where keyword search checks only whether the keyword is present, semantic search weighs the context in which the keyword is used, which typically produces better search results.

Vectorizing text

Use the Generate embeddings method in the watsonx.ai API to vectorize text.

To find out which embedding models are available for use programmatically, use the List the available foundation models method in the watsonx.ai as a service API. Specify the filters=function_embedding parameter to return only the available embedding models.

curl -X GET \
  'https://{cluster_url}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'

Example

The following code snippet uses the slate-30m-english-rtrvr model to convert the following two lines of text into text embeddings:

A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.
Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data.

In this example, two lines of text are being submitted for conversion. You can specify up to 1,000 lines. Each line that you submit must conform to the maximum input token limit that is defined by the embedding model.

To address cases where a line might be longer, the truncate_input_tokens parameter is specified to force the line to be truncated. Otherwise, the request might fail. The input_text parameter is included so that the original text is added to the response, making it easier to pair the original text with each set of embedding values.

You specify the embedding model that you want to use as the model_id in the payload for the embedding method.

curl -X POST \
  'https://{region}.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-02' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer eyJraWQiOi...' \
  --data-raw '{
  "inputs": [
    "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.",
    "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
  ],
  "parameters":{
    "truncate_input_tokens": 128,
    "return_options":{
      "input_text":true
    }
  },
  "model_id": "ibm/slate-30m-english-rtrvr",
  "project_id": "81966e98-c691-48a2-9bcc-e637a84db410"
}'

The response looks something like this, where the 384 values in each embedding are reduced to 6 values to improve the readbility of the example:

{
  "model_id": "ibm/slate-30m-english-rtrvr",
  "created_at": "2024-05-02T16:21:56.771Z",
  "results": [
    {
      "embedding": [
        -0.023104044,
        0.05364946,
        0.062400896,
        ...
        0.008527246,
        -0.08910927,
        0.048190728
      ],
      "input": "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks."
    },
    {
      "embedding": [
        -0.024285838,
        0.03582272,
        0.008893765,
        ...
        0.0148864435,
        -0.051656704,
        0.012944954
      ],
      "input": "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
    }
  ],
  "input_token_count": 57
}

Learn more

Parent topic: Coding generative AI solutions