Summary

The article advocates for an Intelligent Agent Model to enhance ChatGPT's grounding in specific data domains by emulating human research methods, which involves iterative search and refinement, surpassing the limitations of the traditional Retrieval Augmented Generation (RAG) pattern.

Abstract

The Retrieval Augmented Generation (RAG) model, commonly used for grounding ChatGPT in specific data domains, has focused on improving retrieval efficiency rather than intelligence. This article introduces a more sophisticated approach, the Intelligent Agent Model, which mirrors human research behavior by conducting multiple searches, evaluating interim results, and refining queries as needed before providing an answer. This model allows for an iterative, multi-step research process that can adapt and expand upon initial findings, unlike the RAG pattern which is limited to a single retrieval step without feedback or refinement capabilities. The proposed model leverages Azure OpenAI's function-calling capability to autonomously manage the research process, enabling the chatbot to dynamically interact with external search tools and make informed decisions about when to seek further information or clarify the user's intent. The article also outlines the implementation process for this intelligent agent, emphasizing the importance of planning, acting, observing, and adjusting within a looped process that continues until the agent is ready to provide a final answer. This approach is demonstrated to be more effective than the traditional RAG pattern, particularly in complex research tasks that require multi-step execution and adaptive strategies.

Opinions

The RAG pattern is criticized for its inefficiency in managing retrieval, augmentation, and generation as separate processes, which can lead to irrelevant information being provided to the chatbot.
The traditional RAG model's lack of a feedback loop from the generation phase to the retrieval phase is seen as a significant limitation, as it prevents the model from correcting suboptimal search results.
The context provided by the retrieval process in the RAG model is static and cannot be expanded or refined, which is viewed as a drawback for complex queries requiring multi-step research.
The Intelligent Agent Model is presented as a superior alternative to the RAG pattern, offering a more human-like approach to research that includes iterative search and refinement.
The use of Azure OpenAI's function-calling capability is highlighted as a key enabler for the Intelligent Agent Model, simplifying the implementation of autonomous search tool interactions.
The article emphasizes that the intelligent agent should be capable of planning its research, acting by formulating queries, observing the results, and adjusting its strategy as necessary, which is not possible with the RAG pattern.
The conclusion suggests that adopting the Intelligent Agent Model leads to substantial enhancements in grounded ChatGPT solutions, allowing the model to test various strategies and refine its approach based on observed outcomes.

Forget RAG: Embrace agent design for a more intelligent grounded ChatGPT!

The Retrieval Augmented Generation (RAG) design pattern has been commonly used to develop a grounded ChatGPT in a specific data domain. However, the focus has primarily been on improving the efficiency of the retrieval tool such as embedding search, hybrid search, and fine-tuning embedding rather than intelligent search. This article introduces a new approach inspired by human research methods that involve multiple search techniques, observing interim results, refining, and retrying in a multi-step process before providing a response. By utilizing intelligent agent design, this article proposes building a more intelligent and grounded ChatGPT that exceeds the limitations of traditional RAG models.

RAG pattern and limitations

Overview of the standard RAG Pattern implementation:

The process begins with the creation of a query from the user’s question or conversation, typically through a prompted language model (LLM). This is commonly referred to as the query rephrasing step.
This query is then dispatched to a search engine, which returns relevant knowledge (Retrieval).
The retrieved information is then enhanced with a prompt that includes the user’s question and is forwarded to the LLM (Augmentation).
Finally, the LLM responds with an answer to the user’s query (Generation).

Limitations of RAG

In the RAG pattern, Retrieval, Augmentation, and Generation are managed by separate processes. Each process might be facilitated by an LLM with a distinct prompt. However, the Generation LLM, which directly interacts with the user, often knows best what is required to answer the user’s query. The Retrieval LLM might not interpret the user’s intent in the same manner as the Generation LLM, providing it with unnecessary information that could impede its ability to respond.
Retrieval is performed once for each question, without any feedback loop from the Generation LLM. If the retrieval result is irrelevant, due to factors such as a suboptimal search query or search terms, the Generation LLM lacks a mechanism to correct this and may resort to fabricating an answer.
The context from retrieval is unchangeable once provided and cannot be expanded. For instance, if the research result suggests that further investigation is required, such as a retrieved document referring to another document that should be further retrieved, there’s no provision for this.
The RAG pattern does not support multi-step research.

2. Intelligent Agent Model

The Intelligent Agent Model draws inspiration from the human approach to research when answering a question for which immediate knowledge is lacking. In this process, one or multiple searches may be performed to gather useful information before providing a final answer. The result of each search can determine whether further investigation is required and, if so, the direction of the subsequent search. This iterative process continues until we believe we have amassed sufficient knowledge to answer, or conclude that we cannot find enough information to respond. Occasionally, the results from the research can lead to further clarification of the user’s intent and scope of the query.

To replicate this approach, the proposal is to develop an intelligent agent powered by a Language Model (LLM) that manages conversations with a user. The agent autonomously determines when it needs to conduct research using external tools, formulates one or multiple search queries, carries out the research, reviews the results, and decides whether to continue with further research or seek clarification from the user. This process persists until the agent deems itself ready to provide an answer to the user.

3. Implementation

The starting point

With Azure OpenAI’s function-calling capability, it is much simpler to implement an agent that can autonomously use a search tool to locate information needed to assist with user requests. This feature alone streamlines the traditional implementation of the RAG pattern, where query rephrasing, augmentation, and generation are handled separately, as previously described.

The agent interacts with the user using the system-defined persona and objectives, while being aware of the search tool at its disposal. When the agent needs to find knowledge it doesn’t possess, it formulates a search query and signals the search engine to retrieve the required answer.

This process is not only reminiscent of human behavior but also more efficient than the RAG pattern, where knowledge retrieval is a separate process that provides information to the chatbot, irrespective of whether it’s needed or not.

To implement this capability:

Define persona, expected behavior and the tool(s) to use, when to use it.

2. Define function specification in json format with function and parameter description.

Interestingly, the parameter description for “the search query to use to search the knowledge base” plays a crucial role. It guides the LLMs to formulate a suitable search query based on what’s needed to assist the user in the conversation. Furthermore, the search query parameter can be described and constrained to adhere to specific tool formats, such as the Lucene query format. Additional parameters can also be incorporated for tasks such as filtering.

3. Implement function calling flow

At this juncture, we have developed an intelligent agent capable of conducting independent searches. However, to truly create a smart agent capable of undertaking more complex research tasks, such as multi-step and adaptive execution, we need to implement a few additional capabilities. Fortunately, this implementation process can be straightforward and simple.

Enhancements to create intelligent research agent.

Adding ability for the agent to plan, act, observe and adjust in the system message as highlighted:

The added instruction says that the bot should retry and change the question if needed. Also, it says the bot should review the result of the search to guide the next search and employ a multi-step approach if needed. This assumes that there can be multiple invocations of the search tool.

As the LLM cannot repeat this process on its own, we need to manage this using application logic. We can do this by putting the entire process in a loop. The loop exits when the model is ready to give the final answer:

Here is the intelligent agent in action in a demo scenario:

The question is a comparison of a feature between two products. The feature for each product is stored in a separate document. To do this, our agent performs two search queries:

X100 vs Z200 power profile for Radio 0
X100 power profile for Radio 0

The first query is a greedy approach as the agent hoped there was a document containing the comparison. This is not the case as the search query did not return sufficient information on the X100, so it added the second query dedicated to X100.

If this were given to a classic RAG solution, it would have failed to find a good answer as it would stop at the first query.

Conclusion

Implementing the agent model can lead to substantial enhancements in grounded ChatGPT solutions. This is due to the intelligent capability of the model to test various strategies and refine its approach based on observed results.

References

Full code implementation for this article can be found here
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks arXiv:2005.11401 [cs.CL]
OpenAI’s function calling: Function calling — OpenAI API