RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL
RAPTOR is a novel retrieval-augmented language model approach that constructs a hierarchical tree by recursively embedding, clustering, and summarizing text chunks. This method allows for integrating information across lengthy documents at different levels of abstraction. Controlled experiments demonstrate that RAPTOR significantly improves performance over traditional retrieval-augmented LMs on complex question-answering tasks, achieving state-of-the-art results on benchmarks like QuALITY, QASPER, and NarrativeQA. ✨
Article Points:
1
RAPTOR builds a hierarchical tree via recursive embedding, clustering, and summarization.
2
Integrates information across documents at varying levels of abstraction for holistic understanding.
3
Outperforms traditional retrieval-augmented LMs on several complex QA tasks.
4
Achieves new state-of-the-art results on QuALITY, QASPER, and NarrativeQA datasets.
5
The 'collapsed tree' querying method consistently shows superior performance.
6
Upper-level nodes in the tree are crucial for thematic and multi-hop queries.
RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL
Problem Addressed

Limited context in RALMs

Lack of holistic document understanding

Expensive long contexts

Tree Construction Process

Segment text into chunks

Embed chunks using SBERT

Cluster similar chunks with GMM/UMAP

Summarize clusters using LLM

Recursively build tree bottom-up

Querying Mechanisms

Collapsed Tree: Evaluates all nodes simultaneously

Tree Traversal: Layer-by-layer selection

Collapsed Tree performs better

Key Contributions

Multi-level abstraction for context

Semantic grouping, not just adjacency

Improved retrieval effectiveness

Linear scalability in cost and time

Achieved Performance

Outperforms traditional retrieval methods

New state-of-the-art on QA datasets

Significant accuracy gains with GPT-4