Advanced RAG Retrieval Strategies: Sentence Window Retrieval

We have previously discussed the content of RAG (Retrieval Augmented Generation) related to Large Language Models (LLMs), but as LLM technology evolves, more advanced RAG retrieval methods have been discovered. Compared to base RAG retrieval, advanced RAG involves more in-depth technical details and complex search strategies, providing more accurate, relevant, and comprehensive information retrieval results. Today, we introduce one such method in advanced RAG retrieval strategies—sentence window retrieval.

Introduction to Sentence Window Retrieval

Before we dive into sentence window retrieval, let’s briefly introduce base RAG retrieval. Here is a flowchart for base RAG retrieval:

First, the documents are sliced into equally sized chunks
The sliced chunks are then embedded and saved in a vector database
Based on the question, the K most similar document libraries to the embedding are retrieved
The question and retrieval results are fed to the LLM to generate an answer

The issue with base RAG retrieval is that if the document slices are relatively large, the retrieval results may contain a lot of irrelevant information, leading to inaccurate results generated by the LLM. Now, let’s take a look at the flowchart for sentence window retrieval:

Compared to base RAG retrieval, the document slicing unit in sentence window retrieval is smaller, usually based on sentences
During retrieval, in addition to finding the highest matching sentence, the surrounding context of that sentence is also submitted to the LLM as part of the retrieval results

Sentence window retrieval makes the search content more accurate, while the context window ensures the richness of the retrieval results.

Principle

The principle of sentence window retrieval is quite simple. Initially, documents are split into sentences during the slicing process and then embedded and saved in the database. During retrieval, related sentences are found, but not only the retrieved sentences are considered retrieval results. The sentences before and after the retrieved sentence are also included as part of the results. The number of sentences included can be adjusted through parameters, and finally, the retrieval results are submitted together to the LLM to generate an answer.

Image source: https://medium.com/@shivansh.kaushik/advanced-text-retrieval-with-elasticsearch-llamaindex-sentence-window-retrieval-cb5ea720aa44

Let’s understand the principle of sentence window retrieval through example code. In the RAG framework, LlamaIndex implements the sentence window retrieval feature well. Below, we use LlamaIndex to demonstrate the functionality of sentence window retrieval.

from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.schema import Document

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
text = "hello. how are you? I am fine! Thank you. And you? I am fine too. "
nodes = node_parser.get_nodes_from_documents([Document(text=text)])

A document parser SentenceWindowNodeParser, is created with a window_size set to 3. This means the sentence window can contain up to 7 sentences, including 3 sentences before the retrieved sentence, the retrieved sentence itself, and 3 sentences after the retrieved sentence.
The document is parsed using the document parser, and the parsed results include two pieces of metadata: window and original_text.
window_metadata_key refers to the key value that stores all sentences included in the sentence window, whereas original_text_metadata_key refers to the key value of the retrieved sentence.
Finally, the original document is parsed using the document parser.

Note: In previous versions, the sentence window only added 2 sentences after the retrieved sentence, meaning in the default window_size=3 setting, the sentence window would only include a total of 6 sentences. However, in the new version, after extracting core functionality into llama-index-core, the sentence window will include 3 sentences after the retrieved sentence. More information can be found in the official repository code.

Let’s look at the content of nodes after parsing. First, we see the first node:

print(nodes[0].metadata)

# Output
{'window': 'hello.  how are you?  I am fine!  Thank you. ', 'original_text': 'hello. '}

When the first sentence is the retrieved sentence, since there are no other sentences before it, the sentence window contains a total of 4 sentences, including the retrieved sentence itself and the following 3 sentences.

print(nodes[3].metadata)

# Output
{'window': 'hello.  how are you?  I am fine!  Thank you.  And you?  I am fine too. ', 'original_text': 'Thank you. '}

When the fourth sentence is the retrieved sentence, the sentence window will include the 3 sentences before the retrieved sentence, the retrieved sentence itself, and the 3 sentences after the retrieved sentence. However, since there are only 2 sentences after, the total is only 6 sentences.

Splitting Chinese Sentences

Sentence window parsers generally use punctuation marks at the end of English sentences for splitting, with default punctuation marks including .?! and others. However, this method of splitting does not work for Chinese. We can add parsing rule parameters to the document parser to address this issue:

import re

def sentence_splitter(text):
    nodes = re.split("(?<=。)|(?<=？)|(?<=！)", text)
    nodes = [node for node in nodes if node]
    return nodes
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
    sentence_splitter=sentence_splitter,
)

We added a sentence_splitter parameter and passed a custom sentence_splitter function. This function splits the document based on Chinese punctuation marks.

text = "你好。你好吗？我很好！谢谢。你呢？我也很好。 "

print(nodes[0].metadata)
print(nodes[3].metadata)
# Output
{'window': '你好。 你好吗？ 我很好！ 谢谢。', 'original_text': '你好。'}
{'window': '你好。 你好吗？ 我很好！ 谢谢。 你呢？ 我也很好。  ', 'original_text': '谢谢。'}

After replacing the parsing rules, the parser’s parsed sentences have the same effect as when parsing English.

Using Sentence Windows

Next, let’s see how sentence window retrieval is used in actual RAG projects. For document

data, we will still use the plot of the Avengers movie from Wikipedia for testing.

Base RAG Retrieval Example

First, let’s see the effect of base RAG retrieval on document splitting and retrieval:

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.settings import Settings
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding

documents = SimpleDirectoryReader("./data").load_data()
text_splitter = SentenceSplitter()
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = OpenAIEmbedding()
Settings.llm = llm
Settings.embed_model = embed_model
Settings.node_parser = text_splitter
base_index = VectorStoreIndex.from_documents(
    documents=documents,
)
base_engine = base_index.as_query_engine(
    similarity_top_k=2,
)

We created a base RAG retrieval using LlamaIndex, first loading documents from the data directory.
SentenceSplitter is used as the document parser for parsing documents. Unlike the default TokenTextSplitter, SentenceSplitter usually contains complete sentences rather than partial sentences after splitting.
OpenAI’s Embedding and LLM models are used for document embedding and answer generation. The latest version of LlamaIndex uses the Settings parameter instead of the previous ServiceContext.
Finally, a query engine is created to retrieve only the 2 most relevant documents as retrieval results.

Now, let’s see the test results:

question = "Which two members of the Avengers created Ultron?"
response = base_engine.query(question)
print(f"response: {response}")
print(f"len: {len(response.source_nodes)}")

text = response.source_nodes[0].node.text
print("------------------")
print(f"Text: {text}")
text = response.source_nodes[1].node.text
print("------------------")
print(f"Text: {text}")

# Output
response: Tony Stark and Bruce Banner
len: 2
------------------
Text: In the Eastern European country of Sokovia, the Avengers—Tony Stark, Thor, Bruce Banner, Steve Rogers, Natasha Romanoff, and Clint Barton—raid a Hydra facility commanded by Baron Wolfgang von Strucker, who has experimented on humans using the scepter previously wielded by Loki. They meet two of Strucker's test subjects—twins Pietro (who has superhuman speed) and Wanda Maximoff (who has telepathic and telekinetic abilities)—and apprehend Strucker, while Stark retrieves Loki's scepter.

Stark and Banner discover an artificial intelligence within the scepter's gem, and secretly decide to use it to complete Stark's "Ultron" global defense program. The unexpectedly sentient Ultron, believing he must eradicate humanity to save Earth, eliminates Stark's A.I. J.A.R.V.I.S. and attacks the Avengers at their headquarters. Escaping with the scepter, Ultron uses the resources in Strucker's Sokovia base to upgrade his rudimentary body and build an army of robot drones. Having killed Strucker, he recruits the Maximoffs, who hold Stark responsible for their parents' deaths by his company's weapons, and goes to the base of arms dealer Ulysses Klaue in Johannesburg to get vibranium. The Avengers attack Ultron and the Maximoffs, but Wanda subdues them with haunting visions, causing Banner to turn into the Hulk and rampage until Stark stops him with his anti-Hulk armor.[a]

A worldwide backlash over the resulting destruction, and the fears Wanda's hallucinations incited, send the team into hiding at Barton's farmhouse. Thor departs to consult with Dr. Erik Selvig on the apocalyptic future he saw in his hallucination, while Nick Fury arrives and encourages the team to form a plan to stop Ultron. In Seoul, Ultron uses Loki's scepter to enslave the team's friend Helen Cho. They use her synthetic-tissue technology, vibranium, and the scepter's gem to craft a new body. As Ultron uploads himself into the body, Wanda is able to read his mind; discovering his plan for human extinction, the Maximoffs turn against Ultron. Rogers, Romanoff, and Barton fight Ultron and retrieve the synthetic body, but Ultron captures Romanoff. The Avengers fight among themselves when Stark and Banner secretly upload J.A.R.V.I.S.—who is still working after hiding from Ultron inside the Internet—into the synthetic body.

Thor returns to help activate the body, based on his vision that the gem on its brow is the Mind Stone, one of the six Infinity Stones, the most powerful objects in existence. This "Vision" earns their trust by being worthy of lifting Thor's hammer, Mjölnir. Vision and the Maximoffs go with the Avengers to Sokovia, where Ultron has used the remaining vibranium to build a machine to lift a large part of the capital city skyward, intending to crash it into the ground to cause global extinction. Banner rescues Romanoff, who awakens the Hulk for the battle. The Avengers fight Ultron's army while Fury arrives in a Helicarrier with Maria Hill, James Rhodes, and S.H.I.E.L.D. agents to evacuate civilians.

Pietro dies when he shields Barton from gunfire, and a vengeful Wanda abandons her post to destroy Ultron's primary body, which allows one of his drones to activate the machine. The city plummets, but Stark and Thor overload the machine and shatter the landmass. In the aftermath, the Hulk, unwilling to endanger Romanoff by being with her, departs in a Quinjet, while Vision confronts and destroys Ultron's last remaining body. Later, with the Avengers having established a new base run by Fury, Hill, Cho, and Selvig, Thor returns to Asgard to learn more about the forces he suspects have manipulated recent events. As Stark leaves and Barton retires, Rogers and Romanoff prepare to train new Avengers: Rhodes, Vision, Sam Wilson, and Wanda.

In a mid-credits scene, Thanos dons a gauntlet[b] and vows to retrieve the Infinity Stones himself.
------------------
Text: In 2018, twenty-three days after Thanos erased half of all life in the universe,[a] Carol Danvers rescues Tony Stark and Nebula from deep space and they reunite with the remaining Avengers—Bruce Banner, Steve Rogers, Thor, Natasha Romanoff, and James Rhodes—and Rocket on Earth. Locating Thanos on an uninhabited planet, they plan to use the Infinity Stones to reverse his actions, only to find that Thanos has already destroyed them, thus preventing any further use. Enraged, Thor decapitates Thanos.

Five years later, Scott Lang escapes from the Quantum Realm.[b] Reaching the Avengers Compound, he explains that he experienced only five hours while trapped. Theorizing that the Quantum Realm allows time travel, they ask a reluctant Stark to help them retrieve the Stones from the past to reverse the actions of Thanos in the present. Stark, Rocket, and Banner, who has since merged his intelligence with the Hulk's strength, build a time machine. Banner notes that altering the past does not affect their present; any changes create alternate realities. Banner and Rocket travel to Norway, where they visit the Asgardian refugees' settlement New Asgard and recruit an overweight and despondent Thor. In Tokyo, Romanoff recruits Clint Barton, who became a vigilante after his family was erased during the execution of Thanos's plan.[a]

Banner, Lang, Rogers, and Stark time-travel to New York City during Loki's attack in 2012.[c] At the Sanctum Sanctorum, Banner convinces the Ancient One to give him the Time Stone after promising to return the various Stones to their proper points in time. At Stark Tower, Rogers retrieves the Mind Stone from Hydra sleeper agents, but Stark and Lang's attempt to steal the Space Stone fails, allowing 2012-Loki to escape with it. Rogers and Stark travel to Camp Lehigh in 1970, where Stark obtains an earlier version of the Space Stone and encounters his father, Howard. Rogers steals Pym Particles from Hank Pym to return to the present and spies his lost love, Peggy Carter.

Meanwhile, Rocket and Thor travel to Asgard in 2013;[d] Rocket extracts the Reality Stone from Jane Foster, while Thor gets encouragement from his mother, Frigga, and retrieves his old hammer, Mjolnir. Barton, Romanoff, Nebula, and Rhodes travel to 2014; Nebula and Rhodes go to Morag and steal the Power Stone before Peter Quill can,[e] while Barton and Romanoff travel to Vormir. The Soul Stone's keeper, Red Skull, reveals it can only be acquired by sacrificing a loved one. Romanoff sacrifices herself, allowing Barton to get the Stone. Rhodes and Nebula attempt to return to their own time, but Nebula is incapacitated when her cybernetic implants link with her past self, allowing 2014-Thanos to learn of his future self's success and the Avengers' attempt to undo it. 2014-Thanos sends 2014-Nebula forward in time to prepare for his arrival.

Reuniting in the present, the Avengers place the Stones into a gauntlet that Stark, Banner, and Rocket have built. Banner, who has the most resistance to their radiation, uses the gauntlet to undo every one of Thanos's disintegrations. Meanwhile, 2014-Nebula, impersonating her future self, uses the time machine to transport 2014-Thanos and his warship to the present, which he then uses to destroy the Avengers Compound. Present-day Nebula convinces 2014-Gamora to betray Thanos, but is unable to convince 2014-Nebula and kills her. Thanos overpowers Stark, Thor and a Mjolnir-wielding Rogers, and summons his army to retrieve the Stones, intent on using them to destroy the universe and create a new one. A restored Stephen Strange arrives with other sorcerers, the restored Avengers and Guardians of the Galaxy, the Ravagers, and the armies of Wakanda and Asgard to fight Thanos's army. Danvers also arrives and destroys Thanos's warship, but Thanos overpowers her and seizes the gauntlet. Stark steals the Stones and uses them to disintegrate Thanos and his army, sacrificing his life in the process.

Following Stark's funeral, Thor appoints Valkyrie as the new king of New Asgard and joins the Guardians. Rogers returns the Stones and Mjolnir to their proper timelines and remains in the past to live with Carter. In the present, an elderly Rogers passes his shield to Sam Wilson.

The answer from base RAG retrieval is correct because the documents retrieved contain content related to the answer.
There are 2 relevant documents retrieved by base RAG, sorted by relevance.

Sentence Window Retrieval Example

Let’s now examine the effectiveness of sentence window retrieval in projects:

from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
documents = SimpleDirectoryReader("./data").load_data()
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = OpenAIEmbedding()
Settings.llm = llm
Settings.embed_model = embed_model
Settings.node_parser = node_parser
sentence_index = VectorStoreIndex.from_documents(
    documents=documents,
)
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
sentence_window
_engine = sentence_index.as_query_engine(
    similarity_top_k=2, node_postprocessors=[postproc]
)

The code for sentence window retrieval differs from base RAG retrieval in a few ways. The first difference is the use of SentenceWindowNodeParser as the document parser, which we have already discussed.
The second difference is the use of MetadataReplacementPostProcessor for post-processing the retrieval results, replacing the retrieval results with the value of the window metadata.

Test results are as follows:

response = sentence_window_engine.query(question)
print(f"response: {response}")
print(f"len: {len(response.source_nodes)}")

window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]
print("------------------")
print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")
window = response.source_nodes[1].node.metadata["window"]
sentence = response.source_nodes[1].node.metadata["original_text"]
print("------------------")
print(f"Window : {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

# Output
response: Tony Stark and Bruce Banner
len: 2
------------------
Window: In the Eastern European country of Sokovia, the Avengers—Tony Stark, Thor, Bruce Banner, Steve Rogers, Natasha Romanoff, and Clint Barton—raid a Hydra facility commanded by Baron Wolfgang von Strucker, who has experimented on humans using the scepter previously wielded by Loki.  They meet two of Strucker's test subjects—twins Pietro (who has superhuman speed) and Wanda Maximoff (who has telepathic and telekinetic abilities)—and apprehend Strucker, while Stark retrieves Loki's scepter.

 Stark and Banner discover an artificial intelligence within the scepter's gem, and secretly decide to use it to complete Stark's "Ultron" global defense program.  The unexpectedly sentient Ultron, believing he must eradicate humanity to save Earth, eliminates Stark's A.I.  J.A.R.V.I.S.
------------------
Original Sentence: They meet two of Strucker's test subjects—twins Pietro (who has superhuman speed) and Wanda Maximoff (who has telepathic and telekinetic abilities)—and apprehend Strucker, while Stark retrieves Loki's scepter.


------------------
Window 1: In 2018, twenty-three days after Thanos erased half of all life in the universe,[a] Carol Danvers rescues Tony Stark and Nebula from deep space and they reunite with the remaining Avengers—Bruce Banner, Steve Rogers, Thor, Natasha Romanoff, and James Rhodes—and Rocket on Earth.  Locating Thanos on an uninhabited planet, they plan to use the Infinity Stones to reverse his actions, only to find that Thanos has already destroyed them, thus preventing any further use.  Enraged, Thor decapitates Thanos.

 Five years later, Scott Lang escapes from the Quantum Realm.
------------------
Original Sentence: In 2018, twenty-three days after Thanos erased half of all life in the universe,[a] Carol Danvers rescues Tony Stark and Nebula from deep space and they reunite with the remaining Avengers—Bruce Banner, Steve Rogers, Thor, Natasha Romanoff, and James Rhodes—and Rocket on Earth.

The answer from sentence window retrieval is also correct, but the documents retrieved are fewer than those from base RAG retrieval.
The number of sentences in the sentence window matches what we previously introduced, including the Original Sentence and the 3 sentences before and after it.

Retrieval Effect Comparison

After testing with the example code above, we can see that both base RAG retrieval and sentence window retrieval can obtain the correct answer, but it’s not clear which retrieval effect is better. We can use the previously introduced LLM evaluation tool Trulens to compare the effects of both.

from trulens_eval import Tru, Feedback, TruLlama
from trulens_eval.feedback.provider.openai import OpenAI as Trulens_OpenAI
from trulens_eval.feedback import Groundedness

tru = Tru()
openai = Trulens_OpenAI()
def rag_evaluate(query_engine, eval_name):
    grounded = Groundedness(groundedness_provider=openai)
    groundedness = (
        Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
        .on(TruLlama.select_source_nodes().node.text)
        .on_output()
        .aggregate(grounded.grounded_statements_aggregator)
    )
    qa_relevance = Feedback(
        openai.relevance_with_cot_reasons, name="Answer Relevance"
    ).on_input_output()
    qs_relevance = (
        Feedback(openai.qs_relevance_with_cot_reasons, name="Context Relevance")
        .on_input()
        .on(TruLlama.select_source_nodes().node.text)
    )
    tru_query_engine_recorder = TruLlama(
        query_engine,
        app_id=eval_name,
        feedbacks=[
            groundedness,
            qa_relevance,
            qs_relevance,
        ],
    )
    with tru_query_engine_recorder as recording:
        query_engine.query(question)

An evaluation method is defined with query_engine and eval_name as parameters.
Trulens’ groundedness, qa_relevance, and qs_relevance are used to evaluate the RAG retrieval results.

For more information on Trulens, refer to my previous articles. Now, let’s run the evaluation method:

tru.reset_database()
rag_evaluate(base_engine, "base_evaluation")
rag_evaluate(sentence_window_engine, "sentence_window_evaluation")
Tru().run_dashboard()

The Trulens web page shows that sentence window retrieval is not always better than base RAG retrieval; sometimes, it may even be worse. This requires further optimization to improve the effect of sentence window retrieval, such as adjusting the window_size and other parameters.

Conclusion

While RAG can solve most problems in LLM applications, it is not a silver bullet. Advanced RAG retrieval is not a one-size-fits-all solution for all RAG issues. It is necessary to determine which retrieval method to use based on specific project requirements and continuously optimize our RAG applications through parameter adjustments, document optimization, and other methods.

Follow me to learn about various artificial intelligence and AIGC new technologies. Feel free to leave comments if you have any questions or thoughts.

This story is published on Generative AI. Connect with us on LinkedIn and follow Zeniteq to stay in the loop with the latest AI stories. Let’s shape the future of AI together!