avatarFlorian June

Summary

This context discusses various query rewriting techniques for aligning the semantics of queries and documents in Retrieval Augmented Generation (RAG), including Hypothetical Document Embeddings (HyDE), Rewrite-Retrieve-Read, Step-Back Prompting, Query2Doc, and ITER-RETGEN.

Abstract

The context explores the importance of query rewriting in Retrieval Augmented Generation (RAG) to align the semantics of queries and documents. It introduces several techniques, such as Hypothetical Document Embeddings (HyDE), which aligns the semantic space of the query and document through hypothetical documents. Rewrite-Retrieve-Read proposes a framework that focuses on query rewriting, while Step-Back Prompting allows LLM to conduct abstract reasoning and retrieval based on high-level concepts. Query2Doc creates pseudo-documents using a few prompts from LLMs and merges them with the original query to construct a new query. ITER-RETGEN proposes a method of combining the outcome of the prior generation with the previous query, followed by retrieving relevant documents and generating new results. The context also provides code demonstrations for some of these techniques.

Bullet points

  • Query rewriting is a key technique for aligning the semantics of queries and documents in Retrieval Augmented Generation (RAG).
  • Hypothetical Document Embeddings (HyDE) aligns the semantic space of the query and document through hypothetical documents.
  • Rewrite-Retrieve-Read proposes a framework that focuses on query rewriting.
  • Step-Back Prompting allows LLM to conduct abstract reasoning and retrieval based on high-level concepts.
  • Query2Doc creates pseudo-documents using a few prompts from LLMs and merges them with the original query to construct a new query.
  • ITER-RETGEN proposes a method of combining the outcome of the prior generation with the previous query, followed by retrieving relevant documents and generating new results.
  • Code demonstrations for some of these techniques are provided.

Advanced RAG 06: Exploring Query Rewriting

A key technique for aligning the semantics of queries and documents

In Retrieval Augmented Generation(RAG), we often encounter issues with user’s original queries, such as inaccurate wording or lack of semantic information. For instance, a query like “The NBA champion of 2020 is the Los Angeles Lakers! Tell me what is langchain framework?” could yield incorrect or unanswerable responses from the LLM if searched directly.

Consequently, it’s essential to align the semantic space of user queries with that of documents. Query rewriting technology can effectively address this problem. Its role within RAG is depicted in Figure 1:

Figure 1: Query rewriting(red dashed box) in RAG. Image by author.

From the positional perspective, query rewriting is a pre-retrieval method. Note that this diagram roughly illustrates the position of query rewriting in RAG. In the following section, we will see that some algorithms may improve the process.

Query rewriting is a key technique for aligning the semantics of queries and documents. For instance:

  • Hypothetical Document Embeddings (HyDE) aligns the semantic space of the query and document through hypothetical documents.
  • Rewrite-Retrieve-Read proposes a framework, different from the traditional retrieval and reading order, focusing on query rewriting.
  • Step-Back Prompting allows LLM to conduct abstract reasoning and retrieval based on high-level concepts.
  • Query2Doc creates pseudo-documents using a few prompts from LLMs. It then merges these with the original query to construct a new query.
  • ITER-RETGEN proposes a method of combining the outcome of the prior generation with the previous query. This is followed by retrieving relevant documents and generating new results. This process is repeated multiple times to achieve the final result.

Let’s delve into the details of these methods.

Hypothetical Document Embeddings (HyDE)

The paper “Precise Zero-Shot Dense Retrieval without Relevance Labels” proposes a method based on Hypothetical Document Embeddings (HyDE), the primary process is depicted in Figure 2.

Figure 2: An illustration of the HyDE model. Documents snippets are shown. HyDE serves all types of queries without changing the underlying GPT-3 and Contriever/mContriever models. Source: Precise Zero-Shot Dense Retrieval without Relevance Labels.

The process is mainly divided into four steps:

1. Generate k hypothetical documents based on the query using the LLM. These generated documents may not be factual and could contain errors, but they should resemble a relevant document. The purpose of this step is to interpret the user’s query through LLM.

2. Feed the generated hypothetical document into an encoder, mapping it to a dense vector f(dk). It is believed that the encoder serves a filtration function, filtering out the noise within the hypothetical document. Here, dk represents the k-th generated document, and f denotes the encoder operation.

3. Compute the average of the following k vectors using the given formula,

We can also consider the original query q as a possible hypothesis:

4. Use vector v to retrieve answers from the document library. As established in step 3, this vector holds information from both the user’s query and the desired answer pattern, which can improve recall.

My understanding of HyDE is illustrated in Figure 3. The goal of HyDE is to generate hypothetical documents so that the final query vector v aligns as closely as possible with the actual document in the vector space.

Figure 3: From my understanding, the objective of HyDE is to generate hypothetical documents. This way, the final query vector v aligns as closely as possible with the actual document in the vector space. Image by author.

HyDE is implemented in both LlamaIndex and Langchain. The following explanation uses LlamaIndex as an example.

Place this file in YOUR_DIR_PATH. The test code is as follows(The version of LlamaIndex I installed is 0.10.12):

import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

# Load documents, build the VectorStoreIndex
dir_path = "YOUR_DIR_PATH"
documents = SimpleDirectoryReader(dir_path).load_data()
index = VectorStoreIndex.from_documents(documents)


query_str = "what did paul graham do after going to RISD"

# Query without transformation: The same query string is used for embedding lookup and also summarization.
query_engine = index.as_query_engine()
response = query_engine.query(query_str)

print('-' * 100)
print("Base query:")
print(response)


# Query with HyDE transformation
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)

print('-' * 100)
print("After HyDEQueryTransform:")
print(response)

First, take a look at the default HyDE prompt in LlamaIndex:

############################################
# HYDE
##############################################

HYDE_TMPL = (
    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"
    "\n"
    "\n"
    "{context_str}\n"
    "\n"
    "\n"
    'Passage:"""\n'
)

DEFAULT_HYDE_PROMPT = PromptTemplate(HYDE_TMPL, prompt_type=PromptType.SUMMARY)

The code for class HyDEQueryTransform is as follows.

The purpose of the def _run function is to generate the hypothetical document, three debugging statements have been added to the def _run function to monitor the contents of the hypothetical document.:

class HyDEQueryTransform(BaseQueryTransform):
    """Hypothetical Document Embeddings (HyDE) query transform.

    It uses an LLM to generate hypothetical answer(s) to a given query,
    and use the resulting documents as embedding strings.

    As described in `[Precise Zero-Shot Dense Retrieval without Relevance Labels]
    (https://arxiv.org/abs/2212.10496)`
    """

    def __init__(
        self,
        llm: Optional[LLMPredictorType] = None,
        hyde_prompt: Optional[BasePromptTemplate] = None,
        include_original: bool = True,
    ) -> None:
        """Initialize HyDEQueryTransform.

        Args:
            llm_predictor (Optional[LLM]): LLM for generating
                hypothetical documents
            hyde_prompt (Optional[BasePromptTemplate]): Custom prompt for HyDE
            include_original (bool): Whether to include original query
                string as one of the embedding strings
        """
        super().__init__()

        self._llm = llm or Settings.llm
        self._hyde_prompt = hyde_prompt or DEFAULT_HYDE_PROMPT
        self._include_original = include_original

    def _get_prompts(self) -> PromptDictType:
        """Get prompts."""
        return {"hyde_prompt": self._hyde_prompt}

    def _update_prompts(self, prompts: PromptDictType) -> None:
        """Update prompts."""
        if "hyde_prompt" in prompts:
            self._hyde_prompt = prompts["hyde_prompt"]

    def _run(self, query_bundle: QueryBundle, metadata: Dict) -> QueryBundle:
        """Run query transform."""
        # TODO: support generating multiple hypothetical docs
        query_str = query_bundle.query_str
        hypothetical_doc = self._llm.predict(self._hyde_prompt, context_str=query_str)
        embedding_strs = [hypothetical_doc]
        if self._include_original:
            embedding_strs.extend(query_bundle.embedding_strs)

        # The following three lines contain the added debug statements.
        print('-' * 100)
        print("Hypothetical doc:")
        print(embedding_strs)

        return QueryBundle(
            query_str=query_str,
            custom_embedding_strs=embedding_strs,
        )

The test code operates as follows:

(llamaindex_010) Florian:~ Florian$ python /Users/Florian/Documents/test_hyde.py 
----------------------------------------------------------------------------------------------------
Base query:
Paul Graham resumed his old life in New York after attending RISD. He became rich and continued his old patterns, but with new opportunities such as being able to easily hail taxis and dine at charming restaurants. He also started experimenting with a new kind of still life painting technique.
----------------------------------------------------------------------------------------------------
Hypothetical doc:
["After attending the Rhode Island School of Design (RISD), Paul Graham went on to co-found Viaweb, an online store builder that was later acquired by Yahoo for $49 million. Following the success of Viaweb, Graham became an influential figure in the tech industry, co-founding the startup accelerator Y Combinator in 2005. Y Combinator has since become one of the most prestigious and successful startup accelerators in the world, helping launch companies like Dropbox, Airbnb, and Reddit. Graham is also known for his prolific writing on technology, startups, and entrepreneurship, with his essays being widely read and respected in the tech community. Overall, Paul Graham's career after RISD has been marked by innovation, success, and a significant impact on the startup ecosystem.", 'what did paul graham do after going to RISD']
----------------------------------------------------------------------------------------------------
After HyDEQueryTransform:
After going to RISD, Paul Graham resumed his old life in New York, but now he was rich. He continued his old patterns but with new opportunities, such as being able to easily hail taxis and dine at charming restaurants. He also started to focus more on his painting, experimenting with a new technique. Additionally, he began looking for an apartment to buy and contemplated the idea of building a web app for making web apps, which eventually led him to start a new company called Aspra.

embedding_strs is a list containing two elements. The first is the generated hypothetical document, and the second is the original query. They are combined into a list to facilitate vector calculations.

In this example, HyDE significantly enhances output quality by accurately imagining what Paul Graham did after RISD (see hypothetical document). This improves the embedding quality and final output.

Naturally, HyDE also has some failure cases. Interested readers can test these out by visiting this webpage.

HyDE appears unsupervised, no model is trained in HyDE: both the generative model and the contrastive encoder remain intact.

In summary, while HyDE introduces a new method for query rewriting, it does have some limitations. It doesn’t rely on query embedding similarity, instead emphasizing the similarity of one document to another. However, if the language model isn’t well-versed in the topic, it may not always yield optimal results, potentially leading to an increase in errors.

Rewrite-Retrieve-Read

The idea comes from the paper “Query Rewriting for Retrieval-Augmented Large Language Models”. It believes that the original query, particularly in real-world scenarios, may not always be optimal for retrieval by a LLM.

As a result, the paper suggests that we should first use an LLM to rewrite the queries. The retrieval and answer generation should then follow, rather than directly retrieving content and generating answers from the original query, as shown in Figure 4 (b).

Figure 4: From left to right, (a) standard retrieve-then-read method, (b) LLM as a query rewriter for our rewrite-retrieve-read pipeline, and (c) our pipeline with a trainable rewriter. Source: Query Rewriting for Retrieval-Augmented Large Language Models.

To illustrate how query rewriting influences context retrieval and prediction performance, consider this example: the query “The NBA champion of 2020 is the Los Angeles Lakers! Tell me what is langchain framework?” is accurately processed through rewriting.

This is implemented using Langchain, and the essential libraries for installation are as follows:

pip install langchain
pip install openai
pip install langchainhub
pip install duckduckgo-search
pip install langchain_openai

Environment Configuration and Library Import:

import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPEN_AI_KEY"

from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

Construct a chain and execute simple queries:

def june_print(msg, res):
    print('-' * 100)
    print(msg)
    print(res)


base_template = """Answer the users question based only on the following context:

<context>
{context}
</context>

Question: {question}
"""

base_prompt = ChatPromptTemplate.from_template(base_template)

model = ChatOpenAI(temperature=0)

search = DuckDuckGoSearchAPIWrapper()


def retriever(query):
    return search.run(query)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | base_prompt
    | model
    | StrOutputParser()
)

query = "The NBA champion of 2020 is the Los Angeles Lakers! Tell me what is langchain framework?"

june_print(
    'The result of query:', 
    chain.invoke(query)
)

june_print(
    'The result of the searched contexts:', 
    retriever(query)
)

The operation result is as follows:

(langchain) Florian:~ Florian$ python /Users/Florian/Documents/test_rewrite_retrieve_read.py 
----------------------------------------------------------------------------------------------------
The result of query:
I'm sorry, but the context provided does not mention anything about the langchain framework.
----------------------------------------------------------------------------------------------------
The result of the searched contexts:
The Los Angeles Lakers are the 2020 NBA Champions!Watch their championship celebration here!Subscribe to the NBA: https://on.nba.com/2JX5gSN Full Game Highli... Aug 4, 2023. The 2020 Los Angeles Lakers were truly one of the most complete teams over the decade. LeBron James' fourth championship was one of the biggest moments of his career. Only two players from the 2020 team remain on the Lakers. In the storied history of the NBA, few teams have captured the imagination of fans and left a lasting ... James had 28 points, 14 rebounds and 10 assists, and the Lakers beat the Miami Heat 106-93 on Sunday night to win the NBA finals in six games. James was also named Most Valuable Player of the NBA ... Portland Trail Blazers star Damian Lillard recently spoke about the 2020 NBA "bubble" playoffs and had an interesting perspective on the criticism the eventual winners, the Los Angeles Lakers, faced. But perhaps none were more surprising than Adebayo's opinion on the 2020 NBA Finals. The Heat were defeated by LeBron James and the Los Angeles Lakers in six games. Miller asked, "Tell me about ...

The results indicate that there is very little information available on “langchain” based on the searched contexts.

Start constructing the rewriter to rewrite the search query now.

rewrite_template = """Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:"""
rewrite_prompt = ChatPromptTemplate.from_template(rewrite_template)


def _parse(text):
    return text.strip("**")

rewriter = rewrite_prompt | ChatOpenAI(temperature=0) | StrOutputParser() | _parse
june_print(
    'Rewritten query:', 
    rewriter.invoke({"x": query})
)

The result is as follows:

----------------------------------------------------------------------------------------------------
Rewritten query:
What is langchain framework and how does it work?

Construct the rewrite_retrieve_read_chain and utilize the rewritten query.

rewrite_retrieve_read_chain = (
    {
        "context": {"x": RunnablePassthrough()} | rewriter | retriever,
        "question": RunnablePassthrough(),
    }
    | base_prompt
    | model
    | StrOutputParser()
)

june_print(
    'The result of the rewrite_retrieve_read_chain:', 
    rewrite_retrieve_read_chain.invoke(query)
)

The operation result is as follows:

----------------------------------------------------------------------------------------------------
The result of the rewrite_retrieve_read_chain:
LangChain is a Python framework designed to help build AI applications powered by language models, particularly large language models (LLMs). It provides a generic interface to different foundation models, a framework for managing prompts, and a central interface to long-term memory, external data, other LLMs, and more. It simplifies the process of interacting with LLMs and can be used to build a wide range of applications, including chatbots that interact with users naturally.

So far, by rewriting the query, we have successfully obtained the correct answer.

STEP-BACK PROMPTING

STEP-BACK PROMPTING is a simple prompting technique that enables LLMs to abstract, distilling high-level concepts and basic principles from instances containing specific details. The idea is to define “step-back problems” as more abstract problems derived from the original problem.

For example, if a query contains a lot of details, it is difficult for the LLM to retrieve relevant facts to solve the task. As shown in the first example in Figure 5, for the physics problem “What happens to the pressure, P, of an ideal gas if the temperature is increased by a factor of 2 and the volume is increased by a factor of 8 ?” The LLM may diverge from the first principle of the ideal gas law when reasoning about the problem directly.

Likewise, the question, “Estella Leopold went to which school between Aug 1954 and Nov 1954?” is challenging to address directly due to the specific time range constraints.

Figure 5: Illustration of STEP-BACK PROMPTING with two steps of Abstraction and Reasoning guided by concepts and principles. Top: an example of MMLU high-school physics where the first principle of Ideal Gas Law is retrieved via abstraction. Bottom: an example from TimeQA where the high-level concept of education history is a result of the abstraction. Left: PaLM-2L fails to answer the original question. Chain-of-Thought prompting ran into errors during intermediate reasoning steps (highlighted as red). Right: PaLM-2L successfully answers the question via STEP-BACK PROMPTING. Source: TAKE A STEP BACK: EVOKING REASONING VIA ABSTRACTION IN LARGE LANGUAGE MODELS.

In both instances, posing a broader question can assist the model in effectively answering the specific query. Instead of directly asking, “Which school did Estela Leopold attend at a specific time,” we could ask about “Estela Leopold’s educational history.”

This broader topic encompasses the original question and can provide all the necessary information to deduce “Which school Estela Leopold attended at a specific time.” It’s important to note that these broader questions are usually easier to answer than the original specific questions.

Reasoning derived from such abstractions helps prevent errors during the intermediate steps depicted in Figure 5 (left) as the “chain of thought”.

In summary, STEP-BACK PROMPTING involves two basic steps:

  • Abstraction: Initially, we prompt the LLM to pose a broad question about a high-level concept or principle instead of directly responding to the query. We then retrieve relevant facts about the said concept or principle.
  • Reasoning: The LLM can deduce the answer to the original question based on these facts about the high-level concept or principle. We refer to this as abstract reasoning.

To illustrate how step-back prompting influences context retrieval and prediction performance, here is the demonstration code implemented with Langchain.

Environment Configuration and Library Import:

import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPEN_AI_KEY"

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper

Construct a chain and execute original queries:

def june_print(msg, res):
    print('-' * 100)
    print(msg)
    print(res)


question = "was chatgpt around while trump was president?"

base_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

{normal_context}

Original Question: {question}
Answer:"""

base_prompt = ChatPromptTemplate.from_template(base_prompt_template)

search = DuckDuckGoSearchAPIWrapper(max_results=4)
def retriever(query):
    return search.run(query)

base_chain = (
    {
        # Retrieve context using the normal question (only the first 3 results)
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | base_prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)


june_print('The searched contexts of the original question:', retriever(question))
june_print('The result of base_chain:', base_chain.invoke({"question": question}) )

The result is:

(langchain) Florian:~ Florian$ python /Users/Florian/Documents/test_step_back.py 
----------------------------------------------------------------------------------------------------
The searched contexts of the original question:
While impressive in many respects, ChatGPT also has some major flaws. ... [President's Name]," refused to write a poem about ex-President Trump, but wrote one about President Biden ... The company said GPT-4 recently passed a simulated law school bar exam with a score around the top 10% of test takers. By contrast, the prior version, GPT-3.5, scored around the bottom 10%. The ... These two moments show how Twitter's choices helped former President Trump. ... With ChatGPT, which launched to the public in late November, users can generate essays, stories and song lyrics ... Donald Trump is asked a question—say, whether he regrets his actions on Jan. 6—and he answers with something like this: " Let me tell you, there's nobody who loves this country more than me ...
----------------------------------------------------------------------------------------------------
The result of base_chain:
Yes, ChatGPT was around while Trump was president. ChatGPT is an AI language model developed by OpenAI and was launched to the public in late November. It has the capability to generate essays, stories, and song lyrics. While it may have been used to write a poem about President Biden, it also has the potential to be used in various other contexts, including generating responses from hypothetical scenarios involving former President Trump.

The result is obviously incorrect.

Begin constructing step_back_question_chain and step_back_chain to achieve the correct result.

# Few Shot Examples
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

step_back_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)
step_back_question_chain = step_back_prompt | ChatOpenAI(temperature=0) | StrOutputParser()
june_print('The step-back question:', step_back_question_chain.invoke({"question": question}))
june_print('The searched contexts of the step-back question:', retriever(step_back_question_chain.invoke({"question": question})) )



response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

{normal_context}
{step_back_context}

Original Question: {question}
Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)


step_back_chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": step_back_question_chain | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

june_print('The result of step_back_chain:', step_back_chain.invoke({"question": question}) )

The result is as follows:

----------------------------------------------------------------------------------------------------
The step-back question:
When did ChatGPT become available?
----------------------------------------------------------------------------------------------------
The searched contexts of the step-back question:
OpenAI released an early demo of ChatGPT on November 30, 2022, and the chatbot quickly went viral on social media as users shared examples of what it could do. Stories and samples included ... March 14, 2023 - Anthropic launched Claude, its ChatGPT alternative. March 20, 2023 - A major ChatGPT outage affects all users for several hours. March 21, 2023 - Google launched Bard, its ... The same basic models had been available on the API for almost a year before ChatGPT came out. In another sense, we made it more aligned with what humans want to do with it. A paid ChatGPT Plus subscription is available. (Image credit: OpenAI) ChatGPT is based on a language model from the GPT-3.5 series, which OpenAI says finished its training in early 2022.
----------------------------------------------------------------------------------------------------
The result of step_back_chain:
No, ChatGPT was not around while Trump was president. ChatGPT was released to the public in late November, after Trump's presidency had ended. The references to ChatGPT in the context provided are all dated after Trump's presidency, such as the release of an early demo on November 30, 2022, and the launch of ChatGPT Plus subscription. Therefore, it is safe to say that ChatGPT was not around during Trump's presidency.

We can see that by “stepping back” the original query to a more abstract problem, and using both the abstracted and original query for retrieval, the LLM improves its ability to follow the correct reasoning path towards a solution.

As Edsger W. Dijkstra stated, “The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise”

Query2doc

Query2doc: Query Expansion with Large Language Models introduces query2doc. It generates pseudo-documents using a few prompts from LLMs, and then combines them with the original query to create a new one, as shown in Figure 6:

Figure 6: Illustration of query2doc few-shot prompting. We omit some in-context examples for space reasons. Source: Query2doc: Query Expansion with Large Language Models.

In Dense Retrieval, the new query, denoted as q+, is a simple concatenation of the original query (q) and pseudo-documents (d’), separated by [SEP]: q+ = concat(q, [SEP], d’).

Query2doc believes that HyDE implicitly assumes that the groundtruth document and pseudo-documents express the same semantics in different words, which may not hold for some queries.

Another distinction between Query2doc and HyDE is that Query2doc trains a supervised dense retriever, as outlined in the paper.

Currently, in Langchain or LlamaIndex, no replication of query2doc has been found.

ITER-RETGEN

The ITER-RETGEN approach uses generated content to guide retrieval. It iteratively implements “retrieval-enhanced generation” and “generation-enhanced retrieval” within a Retrieve-Read-Retrieve-Read flow.

Figure 7: ITER-RETGEN iterates retrieval and generation. In each iteration, ITER-RETGEN leverages the model output from the previous iteration as a specific context to help retrieve more relevant knowledge, which may help improve model generation (e.g., correcting the height of Hesse Hogan in this figure). We only show two iterations in this figure for brevity. Solid arrows connect queries to the retrieved knowledge, and dashed arrows denote retrieval-augmented generation. Source: Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy.

As illustrated in Figure 7, for a given question q and retrieval corpus D = {d}, where d represents a paragraph, ITER-RETGEN performs T iterations of retrieval generation continuously.

In each iteration t, we first use the generation yt-1 from the preceding iteration, combine it with q, and retrieve the top-k paragraphs. Next, we prompt the LLM M to generate an output yt, which incorporates the retrieved paragraphs (represented as Dyt-1||q) and q into the prompt. Thus, each iteration can be formulated as follows:

The last output yt will be produced as the final response.

Similar to Query2doc, Currently, in Langchain or LlamaIndex, no replication has been found.

Conclusion

This article presents various query rewriting techniques, including code demonstrations for some.

In practice, these query rewriting methods can all be tried, and which method or combination of methods to use depends on the specific effect.

However, regardless of the rewriting method employed, invoking LLM entails some performance trade-offs, which need to be considered in actual use.

In addition, there are some methods, such as query routing, decomposing queries into multiple sub-questions, etc., they do not belong to query rewriting, but they are pre-retrieval methods, these methods will have the opportunity to be introduced in the future.

Lastly, if there are inaccuracies or omissions in this article, or if you have any questions, please address them in the comment section.

Medium’s Boost / AI Predict /FREE GPTs alternative/ Video2Wolds

Large Language Models
Retrieval Augmented
ChatGPT
AI
Ml So Good
Recommended from ReadMedium