LLama-index: RAG Overview and High-Level Concepts

In the ever-changing world of artificial intelligence, Large Language Models (LLMs) are making waves in natural language processing. Companies are eager to harness these powerful models for various tasks.

Yet, some tasks, particularly those requiring niche knowledge and domain-specific data, demand information beyond what LLMs initially learn from publicly available data used during the training process. The conventional fix is to retrain LLMs with new data — a process that’s not only resource-intensive but also necessitates frequent updates to stay relevant.

Here steps in LLama-index, a tool that streamlines the construction of LLM-based applications and tackles this challenge through Retrieval-Augmented Generation (RAG). This method eliminates the need for extensive retraining, allowing dynamic use of LLMs across diverse data domains.

Understanding RAG

RAG operates in three steps:

Retrieval: Gathers relevant data from various available sources.
Contextual Addition: Adds this data to the initial query as context.
LLM Interaction: Presents the improved query to the LLM.

In simpler terms, RAG involves furnishing the LLM with the requisite knowledge to answer the query.

Illustrative Use Cases:

Case 1: Imagine being tasked with composing an essay about a recently launched product. If turning to ChatGPT for assistance, one might initially encounter a response indicating unfamiliarity with the specific product.

To overcome this using RAG logic:

Retrieve Data: Search for the product on Google to gather relevant information from articles and blogs.
Integrate Retrieved Data into Query: Structure your ChatGPT prompt to include the gathered details. For instance, begin your prompt with "Write an essay about 'The Product,' considering the following information," followed by the retrieved data.
Present Comprehensive Query to ChatGPT: Submit your refined query to ChatGPT. This way, ChatGPT is equipped with context and specifics, increasing the likelihood of generating a more accurate and informed response.

Case 2: Consider the scenario of building an app leveraging LLM to answer legal professional queries from a stack of legal documents. Instead of retraining the LLM on legal jargon, using the LLama-index, one could seamlessly connect with the LLM and employ RAG to overcome this challenge.

Consider this legal query for example: “Provide insights into recent cases involving intellectual property disputes in the technology sector.”. The logic to answer would follow these steps:

Retrieval: Fetch recent legal documents related to intellectual property disputes in tech from data sources.
Contextual Addition: Integrate this legal data into the query for a more informed prompt.
LLM Interaction: Present the enriched prompt to the LLM, generating detailed insights.

By adopting this approach, myriad use cases for LLM applications become apparent.

Now armed with an understanding of RAG, the subsequent section will delve into high-level concepts and key stages in RAG. It will include a demonstration of implementation with llama-index using a Wikipedia article on Twitter (The notebook is accessible through this link). It’s important to note that the goal of this example is to provide an idea of how it works, so prioritize understanding the concept over the code, which will be the focus of my upcoming articles.

RAG stages

The stages within RAG are Loading, Indexing, and Storing data, then Querying, and Evaluation.

Loading: This stage involves obtaining data from its sources, be it text files, PDFs, Excel, or PowerPoint …, preparing it, and integrating it into your pipeline. To facilitate this process, Llama-index proposes a wide range of connectors to support the variety of data.

In the cells below, the Wikipedia article will be downloaded and saved as a .txt file. Following this, it will be loaded using SimpleDirectoryReader, which returns a list of Document objects.

# Downloading the article and saving it in the `/data` directory.
from pathlib import Path
import requests
response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": 'Twitter',
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]

data_path = Path("data")
if not data_path.exists():
        Path.mkdir(data_path)

with open(data_path / f"Twitter.txt", "w") as fp:
        fp.write(wiki_text)

#  Loading files in `/data` directory. 
from llama_index import SimpleDirectoryReader

#  this function will return a list of Document objects
documents = SimpleDirectoryReader("data").load_data()

Indexing: it refers to giving the various data a presentation that makes it searchable and facilitates quick access to relevant information. In other words, it involves creating a data structure that allows for querying the data.

One way to accomplish this with Llama-index is by utilizing VectorStoreIndex, as demonstrated in the code snippet below.

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

Storing: In cases where the source data is static and does not require updating before each query, the generated indexes should be stored to avoid re-indexing, which could be time-consuming in some cases.

In the code snippet below, the generated indexes have been stored in the data directory and then reloaded for demonstration purposes.

from llama_index import (
    StorageContext,
    load_index_from_storage,
)

index.storage_context.persist('data/index')
storage_context = StorageContext.from_defaults(persist_dir='data/index')
index = load_index_from_storage(storage_context)

Querying: data involves connecting indexed data with the Large Language Model (LLM). This allows users to interact with the data by prompting the model. Subsequently, the model generates responses, leveraging the information from the indexed data.

In this demonstration, the OpenAI GPT-3.5-turbo model is employed. Initially prompted without indexes and later with the generated index, the model struggled to recognize X in the first attempt. Nevertheless, in the second attempt with indexes, it delivered an accurate response.

In the code snippets below, the OpenAI GPT-3.5-turbo model is used. Initially prompted without indexes, as demonstrated, the model doesn’t recognize X. However, in the second attempt, indexes are provided again, and the model accurately responds.

from llama_index.llms import OpenAI
llm=OpenAI(model='gpt-3.5-turbo')

from llama_index.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="tell me about the social media X"),
]
resp = llm.chat(messages)
print(resp)

Output: “Social media X is a hypothetical social media platform that does not exist in reality. As such, there is no specific information available about its features, purpose, or user base …”

query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("tell me about the soacial media X")
print(response)

Output: “X is a social media website that was formerly known as Twitter. It is based in the United States and is one of the largest social networks in the world, with over 500 million users. Users can share text messages, images, and videos called “tweets” on the platform…”

The final stage is evaluating it, which is the essential process of measuring how accurately, faithfully, and swiftly a language model responds to queries, thus providing insights for improvement.

Conclusion

This article aims to provide a comprehensive overview of the key concepts surrounding LLama-index and Retrieval-Augmented Generation (RAG), showcasing how these methodologies streamline the development of Large Language Model (LLM)-based applications.

Anticipating the next steps, the forthcoming article will immerse in the technical intricacies of each stage of LLama-index. From loading data to querying and evaluation, delving into the hands-on aspects will offer a more granular understanding of how to implement these concepts in real-world scenarios.

Don’t hesitate to reach out to me or leave a comment if you have any questions or need more details. Your feedback and inquiries are valuable, and I’m here to provide any additional information or clarification you may require. Stay tuned for the next article and thank you for exploring this journey with me!

References

Llama-index documentation: https://docs.llamaindex.ai/en/stable/

Demo notebook: https://github.com/iron8kid/llama-index-serie/