Free AI web copilot to create summaries, insights and extended knowledge, download it at here
3362
Abstract
ties. You can think of communities as a group of people, organizations, locations, or events that are closely related. For example, if you are building the graph with a movie script, the node representing the main character and the node representing her friend might be grouped as a community.</p><p id="98b4">After the communities are created, GraphRAG will start to generate a summary for each community. Those summaries describe the relationship or the topic within the group of nodes and their relations.</p><p id="14dd">We don’t just stop after creating the first level of communities. Once the first level of the community is built, GraphRAG will treat those communities as the nodes for the next level, and construct communities for a higher level. This approach can help create the overview at different levels of granularity. If your question is more for the high level (e.g. what’s the story theme), then this approach can help find the answer in a broader context. We will discuss more details in the next section.</p><figure id="4800"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ceqgh7YqTrOE95AonGln_Q.png"><figcaption>Clustering illustration (source: <a href="https://arxiv.org/pdf/2404.16130">original paper</a>)</figcaption></figure><h2 id="70fb">3. MapReduce approach for information extraction</h2><p id="ef94">Finally, we can explain why the former techniques help improve the quality of the answers generated. GraphRAG supports two kinds of query modes: global search and local search.</p><p id="5d7f"><b>Global search: Community Summary -> Global answer</b></p><p id="d4c8">Global search aims to provide the answer to questions that require understanding at a higher level. The solution is to aggregate the insight across the community summaries. The global search approach is very different from the traditional RAG, where the answer is based on semantically similar documents, we try first to generate the overview for elements in the document and use the summarized result to answer the question.</p><figure id="4477"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YZqEe6E0uA1zRSrnaVdZjw.png"><figcaption>Global search illustration (source: <a href="https://microsoft.github.io/graphrag/posts/query/0-global_search/">Microsoft</a>)</figcaption></figure><p id="662f"><b>Local search: Knowledge Graph -> Local answer</b></p><p id="04a8">On the other hand, local search starts from the entities in question and uses the knowledge graph to find the most relevant information. For example, given the entity in the query, we may first use the information of connected nodes. In the official implementation, there’s also an option to use graph embedding to find the most relevant nodes in the graph.</p><p id="f38c">Now we have walked through all the interesting ideas behind GraphRAG, we can discuss what we can learn from it and how we can apply it in different scenarios.</p><figure id="543d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lcq4N_WFNy8XOhPrzeICHQ.png"><figcaption>Local search illustration (source: <a href="https://microsoft.github.io/graphrag/posts/query/1-local_search/">Microsoft</a>)</figcaption></figure><h1 id="e4fc">Implication: what can we learn from GraphRAG</h1><p id="1b50">Although GraphRAG is a powerful tool, there are still some reasons that we
Options
may not want to use it directly:</p><p id="f709"><b>Indexing cost is high</b></p><p id="02af">GraphRAG uses LLM to generate all the components in a graph, and its system prompt is also quite long (e.g. entity extraction prompt has roughly 1500 tokens). Even if you have only a few documents, the system prompt is still a burden as it increases the number of input tokens. Besides building the knowledge graph itself, the community summaries of communities also lead to a large number of output tokens.</p><p id="f486"><b>Not suitable for documents without obvious entities or documents that are well-structured</b></p><p id="a20a">Some documents might make it harder to construct the knowledge graph or it’s already well organized and you can directly leverage its structure. In this case, building a knowledge graph index is not necessary. For example, if you are using the API documents as the reference, having a knowledge graph could be overkill as the raw document already describes the relationship clearly. Another example is the spreadsheet data, in this case, it’s too complicated to express the relationship with a graph.</p><p id="f246">Finally, in a recent <a href="https://www.microsoft.com/en-us/research/blog/graphrag-new-tool-for-complex-data-discovery-now-on-github/">blog post</a>, Microsoft provides the following suggestion:</p><blockquote id="81e1"><p>The overall suitability of GraphRAG for any given use case, however, depends on whether the benefits of structured knowledge representations, readymade community summaries, and support for global queries outweigh the upfront costs of graph index construction.</p></blockquote><p id="2545">GraphRAG is unnecessarily the go-to solution for all cases. But still, we can borrow some of the ideas from GraphRAG’s implementation even if your use case is not suitable:</p><p id="caf5"><b>Implication 1: Pre-summarize the information at different levels</b></p><p id="af02">We can pre-aggregate the insights across documents and use them to generate the answer. When building the summary, we can create the summary with different amounts of details, and store the mapping between documents and the summary.</p><p id="aec8">At the query stage, we can first find the most relevant summaries using the similarity search. Next, we can either use the mapping to find the corresponding documents, or we can also use it to generate the answer like GraphRag.</p><p id="83f9"><b>Implication 2: Entity as the matching field</b></p><p id="a9bd">We can first use LLM to list relevant entities for our documents. When a query is passed, we first extract the entity from the query and use it to find the related documents directly. For example, we can use LLM to find the entities in the question and use full-text search or search filters to find the documents. If the document does not have obvious entities, we can also try to generate extra metadata or tags.</p><p id="a953"><b>Summary</b></p><p id="bf0d">GraphRAG provides a novel approach to solving the traditional RAG’s drawback, that is, answering questions that require the global context of the documents. Besides that, its local search feature also provides an alternative for using only the semantical similarity search. Even we don’t directly use the tool itself, we can still use the concept to improve the RAG implementation.</p></article></body>