AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
AgentFly introduces a novel learning paradigm for adaptive Large Language Model (LLM) agents, enabling low-cost continual adaptation without fine-tuning the underlying LLMs. It formalizes this as a Memory-augmented Markov Decision Process (M-MDP) with a neural case-selection policy, storing past experiences in an episodic memory. This approach achieves top-1 performance on GAIA validation and strong results on DeepResearcher, offering an efficient pathway for generalist LLM agents capable of real-time learning. ✨
Article Points:
1
Enables LLM agents to adapt continually without fine-tuning the base LLMs.
2
Utilizes memory-based online reinforcement learning for low-cost adaptation.
3
Formalizes learning as a Memory-augmented Markov Decision Process (M-MDP).
4
Employs a planner-executor architecture with Case-Based Reasoning (CBR).
5
Achieves top-1 on GAIA validation (87.88% Pass@3) and strong benchmark results.
6
Offers a scalable and efficient pathway for generalist LLM agents.
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Core Idea

No LLM fine-tuning for adaptation

Low-cost continual learning

Memory-based online RL

Methodology

Memory-augmented MDP (M-MDP)

Soft Q-Learning for retrieval policy

Kernel-based Q-function estimation

Architecture

Planner-Executor framework

Case Memory for episodic traces

Subtask & Tool Memory

Key Features

Case-Based Reasoning (CBR)

Parametric & Non-Parametric Memory

Adaptive case selection

Performance

GAIA: Top-1 (87.88% Pass@3)

DeepResearcher: SOTA (66.6% F1)

HLE & SimpleQA: Strong results

Tools

Search engines & Web crawlers

Multimodal data processing

Code execution & Math