LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
LeanRAG is a novel Retrieval-Augmented Generation (RAG) framework designed to overcome limitations in existing knowledge graph-based RAG methods, specifically "semantic islands" and structurally unaware retrieval. It integrates a semantic aggregation algorithm with a bottom-up, structure-guided hierarchical retrieval strategy. This approach aims to provide concise yet contextually comprehensive evidence sets for Large Language Models (LLMs), significantly improving response quality and reducing retrieval redundancy. ✨
Article Points:
1
Novel semantic aggregation algorithm for superior knowledge condensation.
2
Constructs multi-resolution KG with explicit inter-cluster relations.
3
Bottom-up, structure-aware retrieval minimizes information redundancy.
4
Anchors queries to fine-grained entities, traverses semantic pathways.
5
Achieves state-of-the-art performance on diverse QA benchmarks.
6
Reduces retrieval redundancy by 46% while improving response quality.
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
Challenges Addressed

Semantic Islands

Structurally Unaware Retrieval

Information Redundancy

Core Innovations

Semantic Aggregation

Structured Retrieval

Methodology

Hierarchical KG Aggregation

LCA Path Traversal

Experimental Results

State-of-the-Art Performance

Reduced Redundancy (46%)

Inter-cluster Relations Impact

Textual Context Necessity

Key Contributions

Novel Aggregation Algorithm

Bottom-up Retrieval Strategy

SOTA QA Performance

Implementation

DeepSeek-V3 LLM

BGE-M3 Embeddings

Clustersize & Threshold