LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Previous Card

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

LeanRAG is a novel Retrieval-Augmented Generation (RAG) framework designed to overcome limitations in existing knowledge graph-based RAG methods, specifically "semantic islands" and structurally unaware retrieval. It integrates a semantic aggregation algorithm with a bottom-up, structure-guided hierarchical retrieval strategy. This approach aims to provide concise yet contextually comprehensive evidence sets for Large Language Models (LLMs), significantly improving response quality and reducing retrieval redundancy. ✨

Article Points:

Novel semantic aggregation algorithm for superior knowledge condensation.

Constructs multi-resolution KG with explicit inter-cluster relations.

Bottom-up, structure-aware retrieval minimizes information redundancy.

Anchors queries to fine-grained entities, traverses semantic pathways.

Achieves state-of-the-art performance on diverse QA benchmarks.

Reduces retrieval redundancy by 46% while improving response quality.

Source:

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

rag knowledge graph

Challenges Addressed

Semantic Islands

Structurally Unaware Retrieval

Information Redundancy

Core Innovations

Semantic Aggregation

Structured Retrieval

Methodology

Hierarchical KG Aggregation

LCA Path Traversal

Experimental Results

State-of-the-Art Performance

Reduced Redundancy (46%)

Inter-cluster Relations Impact

Textual Context Necessity

Key Contributions

Novel Aggregation Algorithm

Bottom-up Retrieval Strategy

SOTA QA Performance

Implementation

DeepSeek-V3 LLM

BGE-M3 Embeddings

Clustersize & Threshold

Source:

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Next Card

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Semantic Islands

Structurally Unaware Retrieval

Information Redundancy

Semantic Aggregation

Structured Retrieval

Hierarchical KG Aggregation

LCA Path Traversal

State-of-the-Art Performance

Reduced Redundancy (46%)

Inter-cluster Relations Impact

Textual Context Necessity

Novel Aggregation Algorithm

Bottom-up Retrieval Strategy

SOTA QA Performance

DeepSeek-V3 LLM

BGE-M3 Embeddings

Clustersize & Threshold

Dummy's Guide to Modern LLM Sampling Intro Knowledge

Related Cards

Don’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

Learning Facts at Scale with Active Reading

RAG vs KAG: A Comparative Analysis of Retrieval-Augmented Generation and Knowledge-Augmented Generation

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants