Previous Card
Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Atom-Searcher is a novel RL framework designed to enhance agentic deep research by addressing limitations of LLMs and traditional RAG. It introduces "Atomic Thought," decomposing reasoning into fine-grained functional units, which are supervised by Reasoning Reward Models (RRMs) to provide Atomic Thought Rewards (ATR). This framework utilizes a curriculum-inspired reward schedule to mitigate gradient conflicts and reward sparsity, leading to state-of-the-art performance across various benchmarks. ✨
Article Points:
1
Atomic Thought: Decomposes LLM reasoning into fine-grained functional units.
2
Atomic Thought Reward (ATR): Fine-grained guidance from Reasoning Reward Models.
3
Curriculum-inspired reward schedule: Prioritizes ATR early, transitions to outcome rewards.
4
Atom-Searcher: Novel RL framework integrating Atomic Thought and ATR for deep research.
5
SOTA performance: Achieves significant gains on seven in-domain and out-of-domain benchmarks.
6
Interpretable reasoning: Exhibits more human-like and deeper reasoning patterns.
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Problem Addressed
LLM static knowledge limits complex tasks
RAG rigid workflows limit multi-hop reasoning
Outcome-based RL has gradient conflicts & reward sparsity
Core Concepts
Atomic Thought: Fine-grained reasoning units
Reasoning Reward Models: Score Atomic Thoughts
Atomic Thought Reward: Fine-grained guidance
Framework Components
Phase 1: SFT for Atomic Thought generation
Phase 2: RL with hybrid reward
Policy Optimization: GRPO algorithm
Reward Mechanism
Fine-grained ATR from RRMs
Curriculum-based aggregation strategy
Mitigates gradient conflicts
Alleviates reward sparsity
Key Advantages
Scales computation at test-time
Atomic Thoughts provide RRM anchors
Exhibits human-like reasoning patterns
Performance