Previous Card
Google’s Approach for Secure AI Agents: An Introduction
RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL
RAPTOR is a novel retrieval-augmented language model approach that constructs a hierarchical tree by recursively embedding, clustering, and summarizing text chunks. This method allows for integrating information across lengthy documents at different levels of abstraction. Controlled experiments demonstrate that RAPTOR significantly improves performance over traditional retrieval-augmented LMs on complex question-answering tasks, achieving state-of-the-art results on benchmarks like QuALITY, QASPER, and NarrativeQA. ✨
Article Points:
1
RAPTOR builds a hierarchical tree via recursive embedding, clustering, and summarization.
2
Integrates information across documents at varying levels of abstraction for holistic understanding.
3
Outperforms traditional retrieval-augmented LMs on several complex QA tasks.
4
Achieves new state-of-the-art results on QuALITY, QASPER, and NarrativeQA datasets.
5
The 'collapsed tree' querying method consistently shows superior performance.
6
Upper-level nodes in the tree are crucial for thematic and multi-hop queries.
RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL
Problem Addressed
Limited context in RALMs
Lack of holistic document understanding
Expensive long contexts
Tree Construction Process
Segment text into chunks
Embed chunks using SBERT
Cluster similar chunks with GMM/UMAP
Summarize clusters using LLM
Recursively build tree bottom-up
Querying Mechanisms
Collapsed Tree: Evaluates all nodes simultaneously
Tree Traversal: Layer-by-layer selection
Collapsed Tree performs better
Key Contributions
Multi-level abstraction for context
Semantic grouping, not just adjacency
Improved retrieval effectiveness
Linear scalability in cost and time
Achieved Performance