
LANGCHAIN — Benchmarking RAG on Tables
The most technologically efficient machine that man has ever invented is the book. — Northrop Frye
Retrieval augmented generation (RAG) is a crucial concept in LLM app development, especially when dealing with semi-structured data such as tables within documents. In this article, we will explore different approaches to benchmarking RAG on tables and discuss the strategies for evaluating the performance of various methods.
Benchmarking RAG on Tables
To start with, let’s consider the LangChain public benchmark evaluation notebooks:
- Long context LLMs
- Chunk size tuning
- Multi-vector with ensemble
These notebooks provide a detailed exploration of the benchmarking process for RAG on tables.
Approach 1: Long Context LLMs
Using long-context LLMs like GPT-4 128k or Claude2.1 to pass semi-structured documents containing tables into the context window is a straightforward approach. However, challenges arise with the context length and the placement of details within the inputs, impacting the performance of the LLMs, especially with larger datasets.
# Sample code for using long-context LLM for RAG on tables
context = get_context_from_document(document)
response = long_context_LLM.generate_response(context)Approach 2: Targeted Table Extraction
Another approach involves targeted table extraction from documents using specialized models to detect and extract tables. This method may offer high performance but can be complex and may encounter challenges in recognizing diverse table types.
# Sample code for table extraction
tables = table_extractor.extract_tables_from_document(document)Approach 3: Chunking
Chunking documents based on a specified token limit is a simple approach, but selecting the optimal chunk size to preserve tables is a challenge. Chunking along page boundaries can be a reasonable way to preserve tables within chunks, although it may have failure modes such as multi-page tables.
# Sample code for document chunking
chunks = chunk_document(document, chunk_size)Conclusion
While long context LLMs offer simplicity, they can face challenges with context length and table placement. Targeted table extraction may have a high performance ceiling, but it requires specific packages and may suffer from failure modes in recognizing diverse table types. Chunking along page boundaries is a simple approach, but selecting the right chunk size is crucial. Additionally, ensembling can prioritize table-derived text chunks to improve performance.
In conclusion, benchmarking RAG on tables involves experimenting with different approaches and evaluating their performance based on various metrics. By testing and analyzing these methods, developers can determine the most effective strategy for their specific use case.
