Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Atom-Searcher is a novel RL framework designed to enhance agentic deep research by addressing limitations of LLMs and traditional RAG. It introduces "Atomic Thought," decomposing reasoning into fine-grained functional units, which are supervised by Reasoning Reward Models (RRMs) to provide Atomic Thought Rewards (ATR). This framework utilizes a curriculum-inspired reward schedule to mitigate gradient conflicts and reward sparsity, leading to state-of-the-art performance across various benchmarks. ✨
Article Points:
1
Atomic Thought: Decomposes LLM reasoning into fine-grained functional units.
2
Atomic Thought Reward (ATR): Fine-grained guidance from Reasoning Reward Models.
3
Curriculum-inspired reward schedule: Prioritizes ATR early, transitions to outcome rewards.
4
Atom-Searcher: Novel RL framework integrating Atomic Thought and ATR for deep research.
5
SOTA performance: Achieves significant gains on seven in-domain and out-of-domain benchmarks.
6
Interpretable reasoning: Exhibits more human-like and deeper reasoning patterns.
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Problem Addressed

LLM static knowledge limits complex tasks

RAG rigid workflows limit multi-hop reasoning

Outcome-based RL has gradient conflicts & reward sparsity

Core Concepts

Atomic Thought: Fine-grained reasoning units

Reasoning Reward Models: Score Atomic Thoughts

Atomic Thought Reward: Fine-grained guidance

Framework Components

Phase 1: SFT for Atomic Thought generation

Phase 2: RL with hybrid reward

Policy Optimization: GRPO algorithm

Reward Mechanism

Fine-grained ATR from RRMs

Curriculum-based aggregation strategy

Mitigates gradient conflicts

Alleviates reward sparsity

Key Advantages

Scales computation at test-time

Atomic Thoughts provide RRM anchors

Exhibits human-like reasoning patterns

Performance

Achieves SOTA on 7 benchmarks

Significant in-domain improvements

Optimal out-of-domain generalization