Why RAG (Retrieval Augmented Generation) will Become a Cornerstone of System Design Using LLM Capabilities
Recent advances in large language models (LLMs) like GPT-3 [1] have demonstrated powerful few-shot learning abilities — the capacity to adapt to new tasks and domains with just a handful of examples.
This ability is enabling a new paradigm of system design centered around the Retrieval Augmented Generation (RAG) framework [2].
RAG systems combine the broad knowledge of LLMs with scalable retrieval from indexed data, and are likely to become a cornerstone of designing next-generation AI-powered applications.
The limits of training an everything model
Early excitement around LLMs focused on their potential to absorb immense amounts of knowledge during pretraining, ideally capturing everything they would need within their parameters. But the promise of creating a truly “universal” LLM has started to show cracks. As models scale up, researchers are finding they still struggle with various commonsense reasoning tasks [3] and continue to hallucinate incorrect facts [4]. It has become clear that no static model will ever encapsulate all human knowledge or be free of biases.
RAG — scaling knowledge dynamically
The RAG framework offers an alternative paradigm that sidesteps these limitations. Rather than encoding all requisite knowledge statically in an LLM, RAG systems index relevant data in a retrievable format. At runtime, they retrieve the most salient information and pass it alongside user queries to the LLM. The retrievable data acts as a dynamically scalable knowledge source — as new information is added to the indexes, the model gains awareness of it without any retraining.
This paradigm lean heavily on recent advances that enable LLMs to rapidly learn from a handful of examples, a capability known as few-shot learning [5]. With just a couple demonstrations, LLMs can adapt to new tasks, terminologies, and domains. The indexed data provides examples that allow the LLM to instantly learn and talk about new concepts.
Furthermore, a recent paper [3] discusses an unexpected finding where large language models (LLMs) appear to be able to rapidly memorize and learn from single examples during fine-tuning. This challenges the common wisdom that neural networks require many examples to learn.
The authors first noticed odd training loss curves that suggested the LLM was memorizing examples after one pass through the data. They conducted experiments that supported the hypothesis that the models can quickly remember inputs.
The document explores potential reasons for this phenomenon, including:
- LLMs may have very smooth loss surfaces near minimal loss, allowing large steps during training
- Pre-trained LLMs have rich hierarchies of abstractions that can be readily adapted to new tasks
- Using the Adam optimizer results in increasing dynamic learning rates, enabling large steps
The authors note this finding may require rethinking how LLMs are trained and used :
Challenges include catastrophic forgetting and reduced utility of data augmentation.
Potential mitigations are suggested, like using more dropout or carefully mixing datasets. More research is called for to validate the memorization hypothesis and adjust LLM training appropriately.
Benefits of the RAG paradigm
Combining retrieval with contextual learning enables several benefits:
- Grounding — Retrieval grounds the LLM’s responses in available data, reducing unfaithful hallucinations.
- Scaling Knowledge — Indexes act as extendable knowledge, growing as new data is added.
- Transfer Learning — Few-shot adaptation allows fast customization to new domains and data.
- Conversation — Contextual learning supports natural back-and-forth conversation.
These advantages make RAG systems highly generalizable and scalable compared to static models. The RAG paradigm reduces overreliance on model parameters and training data, instead leveraging indexed knowledge and fast adaptation.
A “low-tech” alternative to RAG would be a manual process like:
- Perform keyword search in internal systems.
- Search broader web with semantic search engines like Google.
- Manually collect and synthesize the results.
- Use a templating system or tool like Google Bard to generate a final response.
Compared to this manual approach, automating the RAG pipeline offers a few key advantages:
- Speed — Automated retrieval and synthesis is significantly faster than manual searching and curation. RAG systems can respond within seconds versus minutes/hours.
- Scale — RAG systems can digest corpora of millions of documents, far more than a human could manually analyze. This amplifies the knowledge available.
- Consistency — Automated systems perform reliably over long periods. Manual processes degrade with human fatigue.
- Cost — Once built, the incremental cost per query for an RAG system can be trivial versus ongoing human labor costs.
- Customization — RAG systems can be tailored to specialized domains versus generic web/tool knowledge.
RAG as a new cornerstone of system design
Looking forward, the flexibility of the RAG paradigm makes it an essential new piece of any system involving generative AI components.
In the past, integrating capabilities like summarization or translation required training task-specific models with large datasets. RAG systems powered by general LLMs lower the barrier tremendously — capabilities can be added by simply indexing relevant data and providing a few examples.
As a result, RAG is likely to become a cornerstone of system design in the near future.
The ability to dynamically index knowledge and rapidly adapt to new domains makes RAG systems inherently scalable and customizable.
Adding new conversational capabilities or report generation becomes low-cost and accessible to any organization.
Much like databases expanded what was possible to build with software, the RAG paradigm greatly expands what is possible when leveraging AI as a software component.
Conclusion
Recent results confirm LLMs still face fundamental limits around knowledge and biases. The RAG paradigm circumvents these issues by introducing retrievable indexes that expand available knowledge, while leveraging few-shot learning to facilitate transfer to new domains. Together these properties make RAG systems highly generalizable and scalable, heralding the framework as an essential substrate for building advanced AI applications.
References
[1] Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
[2] Lewis, Patrick, et al. “Retrieval-augmented generation for knowledge-intensive NLP tasks.” arXiv preprint arXiv:2005.11401 (2020).
[3] Bisk, Yonatan, et al. “Experience grounds language.” arXiv preprint arXiv:2004.10151 (2020).
[4] Lee, Kenton, et al. “Scaling language models: Methods, analysis & insights from training gopher.” arXiv preprint arXiv:2112.11446 (2021).
[5] Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
In Plain English
Thank you for being a part of our community! Before you go:
- Be sure to clap and follow the writer! 👏
- You can find even more content at PlainEnglish.io 🚀
- Sign up for our free weekly newsletter. 🗞️
- Follow us on Twitter(X), LinkedIn, YouTube, and Discord.






