avatarEivind Kjosbakken

Summary

The article outlines methods to enhance a Retrieval-Augmented Generation (RAG) system for more efficient question-answering by improving data chunking, adding context retrieval options, using a more advanced language model, and expanding the context window.

Abstract

The article builds upon a previous discussion on creating a RAG system for email search, focusing on enhancements to make the system more practical for real-world applications. It suggests splitting data into more intuitive chunks, specifically one email per chunk, to facilitate better retrieval. The author also recommends including additional metadata such as sender, date, and subject in the chunks to aid the language model in providing more accurate and up-to-date answers. Furthermore, the article proposes using a more powerful language model like Llama2 13B-chat and expanding the context window to improve the system's ability to answer questions with more relevant information. The author emphasizes the importance of balancing the amount of context to avoid overwhelming the language model with nonrelevant data, which could slow down the system and reduce its effectiveness.

Opinions

  • The author believes that chunking emails individually rather than by character count is more logical for a RAG system, as it aligns with how users typically search for emails.
  • Adding detailed metadata to each chunk is seen as crucial for enabling the language model to deliver better answers.
  • There is a preference for using larger, more capable language models, such as Llama2 13B-chat, to enhance the performance of the RAG system, provided the system has the necessary computational resources.
  • The author acknowledges the trade-off between retrieving more documents for context and the potential for introducing noise, which can negatively impact the language model's responses.
  • The article suggests that increasing the context window is a straightforward upgrade, but it must be done judiciously to maintain the efficiency of the RAG system.
  • The author implies that the improvements discussed are not exhaustive and encourages further enhancements tailored to specific use cases and system capabilities.

How To Improve Your Rag System for More Efficient Question-Answering

Improve your RAG system with tools learned in this article

This article continues my last article on making a RAG system. This article will improve on the RAG system developed in the previous article by splitting the data more intuitively, giving the RAG system more options for retrieval, and using a better LLM.

Improve your RAG system with this article. Image by ChatGPT. “make an image on “improving on a RAG system”” prompt. ChatGPT, 4, OpenAI, 23 Mar. 2024. https://chat.openai.com.

Motivation

My motivation for this article is similar to my last article: to create a RAG system that can search emails for me instead of having to find emails myself with a direct word search. If you have not read my last article, I recommend reading that first, as I will build on my code from there. In this article, I will implement several improvements to the RAG system that make it more viable to use in a real-world setting.

Table of Contents

· Motivation · Improving chunking · Adding an option for returning info from a specific email · Using a better LLM · Upgrading the context window · Conclusion

Improving chunking

First, you can import all required packages:

# import packages
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.llms import LlamaCpp
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain import hub
from langchain_core.runnables import RunnablePassthrough, RunnablePick
import pandas as pd
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import GPT4All

The first significant improvement I want to make is the chunking. Previously, I added all the text from all the emails and chunked them based on the number of characters. Instead, I want one email per chunk, which makes more intuitive sense when searching for specific emails.

I have a dataframe containing my information as follows:

An example dataframe containing my information. Image by the author.

I then make chunks manually with the following code:

all_documents = []
for sender, date, body in df[["From", "Date", "Body"]].to_numpy(): #TODO here you can also add additional info
 document_content = f"Sender: {sender}, Date: {date}, Body: {body}"
 document = Document(page_content=document_content, metadata={"source": "local"})
 all_documents.append(document)

The chunks are a Langchain Document type, containing information on the mails. In addition to using the text in an email, I added additional information for the LLM to use when answering my questions. This information includes the date of the email, the sender, the subject of the email, and so on. Adding more details to the RAG system will allow the LLM to give better and more up-to-date answers, which is essential for the performance of the RAG system. You can add further information about each mail if relevant to the RAG system.

The chunks for the example dataframe shown above will then look like below, where each line is one chunk:

Each line is a chunk in the example dataframe. Each chunk contains information about the sender, the date, and the body (content) of an email. Image by the author.

All documents now contain all the chunks you can vectorize with the code:

vectorstore = Chroma.from_documents(documents=all_documents, embedding=GPT4AllEmbeddings())

I am using GPT4AllEmbeddings since they are easy to use in this case, but if you want to learn more about creating your own embeddings, I have written an extensive article about that topic in the article linked below:

You can now also query the RAG regarding information like sender and date. You should note, however, that if you are only searching for a date, it can be difficult for the document retriever to retrieve the correct document. The relevant documents are retrieved with a vector similarity search, which might not work perfectly with the dates since the dates are only a tiny part of the embedded email. You can improve on this by searching separately for dates, for example. If the correct documents have been retrieved, processing the data and answering questions about it should be no problem for the LLM.

Adding an option for returning info from a specific email

Sometimes, it is helpful to see which chunks the RAG system used to give you a response. Therefore, you can add an option to see which chunks are used, showing you the relevant email containing the necessary answers.

In the previous article, I showed you that this can quickly be done with the function:

def get_retrieved_docs(question):
 docs = vectorstore.similarity_search(question)
 return docs

Which returns the documents given as context for the LLM to answer your question.

You can then call the RAG system with:

def invoke_RAG(question):
 res = qa_chain.invoke(question)
 docs = get_retrieved_docs(question)
 return res, docs

This then returns the response from the RAG in the res variable and documents the RAG system used in the docs variable. This can both allow the user of the RAG system to look further into the relevant email of their question and allow for easier debugging since you can see why the RAG system is giving its answer

Using a better LLM

An excellent way to improve the performance of your RAG system is to improve the LLM you are using. In the last article, I used a quantized version of Llama2. This LLM is decent, though there are better options if you have a computer with enough computing. You could go for other open-source language models like Mistral or Falcon, but the easiest way to improve your language model is to choose a larger model. Instead of using Llama2 7B-Chat, I moved on to a Llama2 13B-chat, which should increase performance when answering questions about my emails. You should be aware, however, that using a larger model will require a lot more disk space to store and RAM/VRAM to use, so you should ensure your system can handle the requirements of the LLM before implementing the model on your local system. You can read more about implementing Llama2 in my article on downloading and running Llama2.

Upgrading the context window

Another simple upgrade to your RAG system is upgrading the context window. This can be done in two parts. First, you can retrieve more documents than previously, giving the RAG more context to answer its given question. This can be beneficial because you are more likely to retrieve information relevant to the LLM to answer a question. However, adding more context can also have downsides since you can give the LLM more noise (nonrelevant data), making it harder for the LLM to answer the question. Additionally, retrieving more documents will require more processing time for the LLM to respond, making the RAG system slower.

The layout of a RAG prompt, together with an example. Image by the author.

Furthermore, when increasing the number of documents you retrieve, you should also be sure to increase the context window of the LLM so the LLM can fit all relevant information within its context window. The context window of the LLM is essentially the memory of the large language model, and anything not available within the context window of the LLM will have to be learned by the LLM during training, which highlights the importance of making sure you can fit all relevant context into the context window of the LLM.

Conclusion

In this article, I have discussed a few different approaches you can take to increase the performance of your RAG system. The different improvements I discussed were:

  • Improving chunking
  • Returning to the context (emails), the LLM used to answer questions
  • Using a better LLM
  • Upgrading the context window

This is not a complete list, so there are more improvements you can make to your RAG system. Additionally, the effects of these improvements on your system will depend on its specifics and the task you are using it for.

You can also read my articles on WordPress.

Data Science
AI
Machine Learning
Retrieval Augmented
Question Answering
Recommended from ReadMedium