This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.
Some familiarity with Python is helpful. This notebook uses Python 3.10.
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.
In its simplest form, RAG requires 3 steps:
This notebook contains the following parts:
Before you use the sample code in this notebook, you must perform the following setup tasks:
!pip install "langchain==0.1.10" | tail -n 1
!pip install "ibm-watsonx-ai>=0.2.6" | tail -n 1
!pip install -U langchain_ibm | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install "chromadb==0.3.26" | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install "sqlalchemy==2.0.1" | tail -n 1
import os, getpass
This cell defines the credentials required to work with watsonx API for Foundation Model inferencing.
Action: Provide the IBM Cloud user API key. For details, see documentation.
credentials = {
"url": "https://us-south.ml.cloud.ibm.com",
"apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.
Hint: You can find the project_id
as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be Projects / <project name> /
. Click on the <project name>
link. Then get the project_id
from Project's Manage tab (Project -> Manage -> General -> Details).
try:
project_id = os.environ["PROJECT_ID"]
except KeyError:
project_id = input("Please enter your project_id (hit enter): ")
import wget
filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'
if not os.path.isfile(filename):
wget.download(url, out=filename)
The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.
In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it.
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
The dataset we are using is already split into self-contained passages that can be ingested by Chroma.
Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using get_embedding_model_specs
from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs
get_embedding_model_specs(credentials.get('url'))
from langchain_ibm import WatsonxEmbeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
embeddings = WatsonxEmbeddings(
model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
url=credentials["url"],
apikey=credentials["apikey"],
project_id=project_id
)
docsearch = Chroma.from_documents(texts, embeddings)
LangChain retrievals use embed_documents
and embed_query
under the hood to generate embedding vectors for uploaded documents and user query respectively.
help(WatsonxEmbeddings)
watsonx.ai
¶IBM watsonx foundation models are among the list of LLM models supported by Langchain. This example shows how to communicate with Granite Model Series using Langchain.
You need to specify model_id
that will be used for inferencing:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
model_id = ModelTypes.GRANITE_13B_CHAT_V2
We need to provide a set of model parameters that will influence the result:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods
parameters = {
GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
GenParams.MIN_NEW_TOKENS: 1,
GenParams.MAX_NEW_TOKENS: 100,
GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}
Initialize the WatsonxLLM
class from Langchain with defined parameters and ibm/granite-13b-chat-v2
.
from langchain_ibm import WatsonxLLM
watsonx_granite = WatsonxLLM(
model_id=model_id.value,
url=credentials.get("url"),
apikey=credentials.get("apikey"),
project_id=project_id,
params=parameters
)
Build the RetrievalQA
(question answering chain) to automate the RAG task.
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever())
Get questions from the previously loaded test dataset.
query = "What did the president say about Ketanji Brown Jackson"
qa.invoke(query)
{'query': 'What did the president say about Ketanji Brown Jackson', 'result': ' The president said, "One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence." This statement was made in reference to Ketanji Brown Jackson, who was nominated by the president to serve on the United States Supreme Court.'}
You successfully completed this notebook!.
You learned how to answer question using RAG using watsonx and LangChain.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Copyright © 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.