Traditional RAG systems work by indexing raw data. This data is simply chunked and stored in vector DBs. Whenever a query comes from the user, it queries the stored chunks and retrieves relevant chunks. If you wish to learn the fundamentals of RAG I have written a comprehensive intro about it
As the retrieval step happens for every single query from the user, it is the most crucial bottleneck to speed up naive RAG systems. Would it not be logical to make the retrieval process super efficient? This is the promise of LightRAG.
Before we look at them, you may ask, “Wait. Do we not have GraphRAG from Microsoft?”. Yes, but GraphRAG seems to have a couple of drawbacks.
So, without further adieu, let's dive into LightRAG.
If you are a visual learner like me, why not check this YouTube video explaining the idea of LightRAG:
The two main selling points of LightRAG are Graph-based indexing and a dual-level retrieval framework. So, let's look into each of them.
Below are the steps LightRAG follows to incorporate graph-based indexing.
A query to a RAG system can be one of two types — specific or abstract. In the same bee example, a specific query could be “How many queen bees can be there in the hive?” An abstract query could be, “What are the implications of climate change on honey bees?” To address this diversity, LightRAG employs two retrieval types:
Doing all this exercise and switching to LightRAG improves execution time indeed. During indexing, the LLM needs to be called just once per chunk to extract entities and their relationships.
Likewise, during user query, we only retrieve entities and relationships from chunks using the same LLM we used for indexing. This is a huge saving on the retrieval overhead and hence computation. So, we have a “light” RAG at last!
Integrating new knowledge into existing graphs seems to be a seamless exercise. Instead of re-indexing the whole data whenever we have new information, we can simply append new knowledge to the existing graph.
In their evaluations, they have compared against Naive RAG, RQ-RAG, HyDE, and GraphRAG. To keep the comparison fair, they have used GPT-4o-mini as the LLM across the board with a fixed chunk size of 1200 for all datasets. The answers were evaluated for comprehensiveness, diversity, and effectiveness in answering the user(a.k.a. empowerment in the paper).
As we can see from the underlined results, LightRAG beats all of the state-of-the-art methods currently available.
In general, they draw the following conclusions:
Though RAG is a fairly recent technique, we are seeing rapid progress in the area. Techniques like LightRAG which can take RAG pipelines to cheap commodity hardware are the most welcome. While the hardware landscape is ever-growing, there is always an increasing need to run LLMs and RAG pipelines in compute-constrained hardware in real time.
Would you like to see some hands-on study of LightRAG? Please stay tuned…