This notebook is designed for a technical persona that has familarity with large language models (LLMs). It is preferred, but not required, that the audience is familiar with using LangChain and the RAG pattern.
Retrieval-augmented generation, or RAG, is an architectural pattern for improving the generated responses from large language models (LLM). It does this by augmented queries to the LLM with additional context.
A basic LLM generation takes the query as is, or with some prompt engineering (e.g. prompt template, meta-prompting, chain-of-thought prompting, etc).
With RAG, the query is augmented with relevant knowledge from a knowledge base of some sort. This additional revelant knowledge (context) helps the LLM generation by:
Furthermore, the additional relevant knowledge can provide some explanability to the LLM output by providing a cross-referenceable set of information (i.e. the relevant text) you can examine to see what the LLM is basing its answer on.
For the interested reader, here is the original RAG paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
A common implementation of the knowledge base is a vector store (or vector database). A vector store can inefficient index and search across vectors. With this implementation, a corpus of knowledge (as text) is stored as vectors know as embeddings. Embeddings are not just arbitrary numerical representations of text. Instead, the representation captures the semantic meaning of what is being embedded. In other words, a collection of text that are semantically similar to each other will also be mathametical close to each in vector space. An embedding model converts a chunk of text into its embedding representation.
The following figure is a conceptual illustration of the RAG pattern implemented with a vector store. The encode and decode blocks are embedding models and embedding-to-source text lookup, respectively.
Here's an outline of this notebook.
Setup and Configuration section. We ensure all the required packages are installed and the configuration information (e.g. credentials) are provided.
Define Query section. We establish the query to be used. This is established up front because we will use the same query in a basic completion with use an LLM and with a RAG pattern.
Initialize Language Model section. We select and configure the large language model (LLM).
Perform Basic Completion section. We perform a basic completion with our query and LLM.
Get Data for Documents section. We get and preprocess (e.g. split) the data we want to use in our knowledge base.
Initialize Embedding Model section. We select and configure the embedding model we would like to use to encode our data for our knowledge base.
Initialize Vector Store section. We initialize our vector store with our data and embedding model.
Perform Similiarity Search section. We use our initialized vector store and perform a similiarity search with our query.
Perform RAG Generation section. We perform a completion with a RAG pipeline. In this version, we are explicitly passing the relevant docs (from our similarity search).
Perform RAG Generation with Q&A Chain Section. We perform a completion with a RAG pipeline. In this version, there is no explicit passing of relevant docs.
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")
!pip install langchain -q
!pip install ibm-watson-machine-learning -q
!pip install wget -q
!pip install sentence-transformers -q
langchain
: Orchestration frameworkibm-watson-machine-learning
: For IBM LLMswget
: To download knowledge base datasentence-transformers
: For embedding model!pip install singlestoredb -q
!pip install sqlalchemy-singlestoredb -q
import os
import getpass
try:
wxa_url = os.environ["WXA_URL"]
except KeyError:
wxa_url = getpass.getpass("Please enter your watsonx.ai URL domain (hit enter): ")
try:
wxa_api_key = os.environ["WXA_API_KEY"]
except KeyError:
wxa_api_key = getpass.getpass("Please enter your watsonx.ai API key (hit enter): ")
try:
wxa_project_id = os.environ["WXA_PROJECT_ID"]
except KeyError:
wxa_project_id = getpass.getpass("Please enter your watsonx.ai Project ID (hit enter): ")
If you do not have a SingleStoreDB instance, you can start today with a free trial here. To get the connection strings:
try:
connection_user = os.environ["SINGLESTORE_USER"]
except KeyError:
connection_user = getpass.getpass("Please enter your SingleStore username (hit enter): ")
try:
connection_password = os.environ["SINGLESTORE_PASS"]
except KeyError:
connection_password = getpass.getpass("Please enter your SingleStore password (hit enter): ")
try:
connection_port = os.environ["SINGLESTORE_PORT"]
except KeyError:
database_name = input("Please enter your SingleStore database name (hit enter): ")
try:
connection_host = os.environ["SINGLESTORE_HOST"]
except KeyError:
database_name = input("Please enter your SingleStore database name (hit enter): ")
try:
database_name = os.environ["SINGLESTORE_DATABASE"]
except KeyError:
database_name = input("Please enter your SingleStore database name (hit enter): ")
try:
table_name = os.environ["SINGLESTORE_TABLE"]
except KeyError:
table_name = input("Please enter your SingleStore table name (hit enter): ")
query = "What did the president say about Ketanji Brown Jackson?"
For our language model, we will use Granite, an IBM developed LLM.
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods
parameters = {
GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
GenParams.MIN_NEW_TOKENS: 1,
GenParams.MAX_NEW_TOKENS: 100
}
model = Model(
model_id=ModelTypes.GRANITE_13B_CHAT,
params=parameters,
credentials={
"url": wxa_url,
"apikey": wxa_api_key
},
project_id=wxa_project_id
)
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM
granite_llm_ibm = WatsonxLLM(model=model)
response = granite_llm_ibm(query)
print("Query: " + query)
print("Response: " + response)
Query: What did the president say about Ketanji Brown Jackson? Response: The president said that Ketanji Brown Jackson is an “incredible judge” and that he is “proud” to have nominated her to the Supreme Court.<|endoftext|>
import wget
filename = './state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'
if not os.path.isfile(filename):
wget.download(url, out=filename)
from langchain.document_loaders import TextLoader
loader = TextLoader(filename)
documents = loader.load()
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print("We split our document into " + str(len(texts)) + " chunks.")
We split our document into 42 chunks.
We will be using default LangChain Hugging Face embedding model, which at the time of this writing is sentence-transformers/all-mpnet-base-v2.
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings()
from sqlalchemy import *
connection_url = f"singlestoredb://{connection_user}:{connection_password}@{connection_host}:{connection_port}"
engine = create_engine(connection_url)
with engine.connect() as conn:
result = conn.execute(text("CREATE DATABASE IF NOT EXISTS " + database_name))
# Verify that the database was created
print("Available databases:")
with engine.connect() as conn:
result = conn.execute(text("SHOW DATABASES"))
for row in result:
print(row)
Available databases: ('cluster',) ('information_schema',) ('memsql',) ('movie_recommender',) ('movie_recommender2',) ('resume_evaluator',) ('tpch_optimized',) ('watsonx_ibm',)
with engine.connect() as conn:
result = conn.execute(text("DROP TABLE IF EXISTS " + database_name + "." + table_name))
# Connection string to use Langchain with SingleStoreDB
os.environ["SINGLESTOREDB_URL"] = f"{connection_user}:{connection_password}@{connection_host}:{connection_port}/{database_name}"
from langchain.vectorstores import SingleStoreDB
vectorstore = SingleStoreDB.from_documents(
texts,
embedding_model,
table_name = table_name
)
with engine.connect() as conn:
result = conn.execute(text("DESCRIBE " + database_name + "." + table_name))
print(database_name + "." + table_name + " table schema:")
for row in result:
print(row)
result = conn.execute(text("SELECT COUNT(vector) FROM " + database_name + "." + table_name))
print("\nNumber of rows in " + database_name + "." + table_name + ": " + str(result.first()[0]))
watsonx_ibm.docs_embeddings table schema: ('content', 'text', 'YES', '', None, '') ('vector', 'blob', 'YES', '', None, '') ('metadata', 'JSON', 'YES', '', None, '') Number of rows in watsonx_ibm.docs_embeddings: 42
We find the similar (i.e. relevant) texts to our query. You can modify the number of results returned with k
parameter in the similarity_search
method below.
texts_sim = vectorstore.similarity_search(query, k=5)
print("Number of relevant texts: " + str(len(texts_sim)))
Number of relevant texts: 5
print("First 100 characters of relevant texts.")
for i in range(len(texts_sim)):
print("Text " + str(i+1) + ": " + str(texts_sim[i].page_content[0:100]))
First 100 characters of relevant texts. Text 1: Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Ac Text 2: A former top litigator in private practice. A former federal public defender. And from a family of p Text 3: As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accou Text 4: And I’m taking robust action to make sure the pain of our sanctions is targeted at Russia’s economy Text 5: But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. Danielle says Heat
RAG generation using our model and explicit relevant knowledge (documents) from our similarity search.
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(granite_llm_ibm, chain_type="stuff")
response = chain.run(input_documents=texts_sim, question=query)
print("Query: " + query)
print("Response:" + response)
Query: What did the president say about Ketanji Brown Jackson? Response: The president said that Ketanji Brown Jackson is a consensus builder who will continue Justice Breyer's legacy of excellence.<|endoftext|>
RAG generation using a chain of our model and vector store. The chain handles getting the relevant knowledge (texts) under the hood.
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=granite_llm_ibm, chain_type="stuff", retriever=vectorstore.as_retriever())
response = qa.run(query)
print("Query: " + query)
print("Response:" + response)
Query: What did the president say about Ketanji Brown Jackson? Response: The president said that Ketanji Brown Jackson is a consensus builder who will continue Justice Breyer's legacy of excellence.<|endoftext|>
Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.