avatarFS Ndzomga

Summary

The author is building a simple Retrieval Augmented Generation (RAG) system with FastAPI and shares their progress and design decisions.

Abstract

The author is building a RAG system with FastAPI and documenting their progress in a blog post. They explain the design of the system, which involves transforming user requests into a suitable format for querying a knowledge source, retrieving relevant information, and generating a response using a large language model. The author discusses different methods for querying a knowledge source, such as keyword matching, semantic search, and combining both. They also mention the limitations of semantic search and the possibility of using a language model to generate a knowledge graph. The author then shares their plan for the FastAPI application, including handling document uploads, storing documents, and parsing document content.

Opinions

  • The author believes that RAG is a good approach to ground the responses of a large language model and reduce hallucinations.
  • The author recommends using a combination of semantic search and traditional keyword search for querying a knowledge source.
  • The author does not recommend using a language model to generate a knowledge graph, as it may result in loss of information.
  • The author plans to use the open-closed principle to design entities that are open for extension but closed for modification in their FastAPI application.
  • The author plans to use pytesseract to extract PDF content using OCR when necessary.

Building A Simple RAG System With Fastapi (1)

I am bored, it is 23:00pm local time in France. I just decided to build a simple RAG system with fastapi. I write this blogpost at the same time.

First, the design. Retrieval Augmented Generation is a nice way to ground the responses of a LLM and thus reduce hallucinations. It is the basis of the so called chat with X (X being any sort of file, PDF, DOCX, Videos etc). It is the approach I used when I created Discute.

Here is the basic design. The user sends a question / request, the request goes through a system that can transform it in a way suitable to the query-able representation of the knowledge source (embeddings, relational DB, knowledge graph etc), information relevant to the user request are then routed to the LLM, and using in-context learning, the LLM crafts a response and sends it back to the user.

There are several ways to query a query-able representation of a knowledge source. If your knowledge source is a bunch of text files for example, you can query it using traditional keyword matching for example, and this approach can yield good results for certain types of queries. You can also decide to create embeddings for all your documents, and then use the cosine similarity between user queries and document chunks to retrieve relevant documents or parts of documents. That is what we call semantic search. It is powerful to capture user intent and the nuances of requests. But it also comes with some limitations. For example, if the user request is short, then it might be difficult to extract sufficient meaning out of it to match relevant information in your knowledge source. Find below my article about the limits of semantic search.

You can also decide to combine semantic search with traditional keyword search. I tested that, and it works pretty well too.

You can even decide to use a LLM to generate a knowledge graph from your documents and query that knowledge graph. But I do not recommend this approach a lot. First of all because using a LLM to generate a knowledge graph means loosing some information in the process, without even knowing which information you are loosing. It’s like asking someone from kindergarten to take notes for you during your advanced algebra class in university. Yeah you will have some notes, but I bet information lost in translation will make them less useful.

The query-able representation of your knowledge source can be a relational database. In that case, what you need is to translate user requests into a SQL query that can be used to get relevant information from the database. That’s the so called text-to-SQL pipeline.

Now let’s get into the code. It is always important to spend a lot of time thinking about the problem at hands before trying to code the solution. Like Einstein, I think that: “If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and five minutes thinking about solutions.” Matter fact even before writing the first lines of code, we will have to spend enough time thinking about the abstraction we will need and the architecture of our future solution.

What do I need for my fastapi app ? I need a way for the user to upload documents. Documents can be of several types, and I frankly don’t know if I want to handle all sorts of files. Maybe I should use the open-closed principle and design entities that are open for extension but closed for modification. I will thus need a mechanism to recognise the type of file sent. Maybe I should also use a DB mode to handle cases where the user will just provide connexion details to a DB ? Doing all that in the same app seems complex, and I don’t want to spend too much time on this simple tutorial. So let’s say I will just handle raw files.

The first step is to give users the ability to upload documents.

I need to store the documents uploaded by users.

I need a custom logic to parse the content of documents. Here, I need to make sure my classes are open to extension but closed for modification by using a parser factory. I use pytesseract to extract PDF content using OCR when necessary.

End of part 1 of the tutorial, I will continue it tomorrow.

Llm
Large Language Models
Fastapi
OpenAI
Artificial Intelligence
Recommended from ReadMedium