Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Previous Card

Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol

reinforcement learning agent rag fine tuning

Atom-Searcher is a novel RL framework designed to enhance agentic deep research by addressing limitations of LLMs and traditional RAG. It introduces "Atomic Thought," decomposing reasoning into fine-grained functional units, which are supervised by Reasoning Reward Models (RRMs) to provide Atomic Thought Rewards (ATR). This framework utilizes a curriculum-inspired reward schedule to mitigate gradient conflicts and reward sparsity, leading to state-of-the-art performance across various benchmarks. ✨

Article Points:

Atomic Thought: Decomposes LLM reasoning into fine-grained functional units.

Atomic Thought Reward (ATR): Fine-grained guidance from Reasoning Reward Models.

Curriculum-inspired reward schedule: Prioritizes ATR early, transitions to outcome rewards.

Atom-Searcher: Novel RL framework integrating Atomic Thought and ATR for deep research.

SOTA performance: Achieves significant gains on seven in-domain and out-of-domain benchmarks.

Interpretable reasoning: Exhibits more human-like and deeper reasoning patterns.

Source:

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

reinforcement learning agent rag fine tuning

Problem Addressed

LLM static knowledge limits complex tasks

RAG rigid workflows limit multi-hop reasoning

Outcome-based RL has gradient conflicts & reward sparsity

Core Concepts

Atomic Thought: Fine-grained reasoning units

Reasoning Reward Models: Score Atomic Thoughts

Atomic Thought Reward: Fine-grained guidance

Framework Components

Phase 1: SFT for Atomic Thought generation

Phase 2: RL with hybrid reward

Policy Optimization: GRPO algorithm

Reward Mechanism

Fine-grained ATR from RRMs

Curriculum-based aggregation strategy

Mitigates gradient conflicts

Alleviates reward sparsity

Key Advantages

Scales computation at test-time

Atomic Thoughts provide RRM anchors

Exhibits human-like reasoning patterns

Performance

Achieves SOTA on 7 benchmarks

Significant in-domain improvements

Optimal out-of-domain generalization

Source:

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol

LLM static knowledge limits complex tasks

RAG rigid workflows limit multi-hop reasoning

Outcome-based RL has gradient conflicts & reward sparsity

Atomic Thought: Fine-grained reasoning units

Reasoning Reward Models: Score Atomic Thoughts

Atomic Thought Reward: Fine-grained guidance

Phase 1: SFT for Atomic Thought generation

Phase 2: RL with hybrid reward

Policy Optimization: GRPO algorithm

Fine-grained ATR from RRMs

Curriculum-based aggregation strategy

Mitigates gradient conflicts

Alleviates reward sparsity

Scales computation at test-time

Atomic Thoughts provide RRM anchors

Exhibits human-like reasoning patterns

Achieves SOTA on 7 benchmarks

Significant in-domain improvements

Optimal out-of-domain generalization

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Related Cards

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Key-value memory in the brain