0 / 0
Text embedding generation

Text embedding generation

Use the embedding models and embeddings API that are available from watsonx.ai to create text embeddings that capture the meaning of sentences or passages for use in your generative AI applications.

Converting text into text embeddings helps with document comparison, question-answering, and in retrieval-augmented generation (RAG) tasks, where you need to retrieve relevant content quickly.

What are text embeddings?

A text embedding is a numerical representation of a sentence or passage as a vector of real-valued numbers. By converting sentences to number vectors, operations on sentences become more like math equations, which is something computers can do quickly, and can do well.

When an embedding model creates a vector representation of a sentence, the embedding model assigns values that capture the semantic meaning of the sentence. The embedding model also positions the vector within a multidimensional space based on its assigned values. The size of the dimensional space varies by model, which means the exact vector values vary also. However, all models position the vectors such that sentences with similar meanings are nearer to one another.

Most embedding models generate vectors in so many dimensions, ranging from hundreds to thousands of dimensions, that it's impossible to visualize. Hypothetically, if an embedding model were to generate a 3-dimensional vector, it might look as follows.

A 3-dimensional cube with three data points that represent three sentence embeddings

A few things to notice about the image:

  • The position of each sentence within the 3-dimensional space is defined by its 3-value array.
  • The two sentences that share the subject of artwork (Jan bought a painting of dogs playing cards and The Degas reproduction is hanging in the den) are nearest to each other.
  • The third sentence, I took my dogs for a walk, is nearer to the sentence about the dogs painting because the sentences share the keyword dogs. However, the shared keyword is not the only factor used to position the sentence. The meaning of the sentence is also an important factor.
  • The values in the arrays in the image, such as 0.021, are fictional. The values in an array for a model with more dimensions are more precise, for example, 0.020891575.

You can store generated vectors in a vector database. When the same embedding model is used to convert all of the sentences in the database, the vector store can leverage the inherent groupings and relationships that exist among the sentences based on their vector values to return relevant search results quickly.

Unlike traditional indexes that store text and rely on keyword search for information retrieval, vector stores support semantic searches that retrieve information that is similar in meaning. For example, where keyword search checks only whether the keyword is present, semantic search weighs the context in which the keyword is used, which typically produces better search results.

Learn more

Parent topic: Coding generative AI solutions

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more