AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Previous Card

Deep Think with Confidence

AgentFly introduces a novel learning paradigm for adaptive Large Language Model (LLM) agents, enabling low-cost continual adaptation without fine-tuning the underlying LLMs. It formalizes this as a Memory-augmented Markov Decision Process (M-MDP) with a neural case-selection policy, storing past experiences in an episodic memory. This approach achieves top-1 performance on GAIA validation and strong results on DeepResearcher, offering an efficient pathway for generalist LLM agents capable of real-time learning. ✨

Article Points:

Enables LLM agents to adapt continually without fine-tuning the base LLMs.

Utilizes memory-based online reinforcement learning for low-cost adaptation.

Formalizes learning as a Memory-augmented Markov Decision Process (M-MDP).

Employs a planner-executor architecture with Case-Based Reasoning (CBR).

Achieves top-1 on GAIA validation (87.88% Pass@3) and strong benchmark results.

Offers a scalable and efficient pathway for generalist LLM agents.

Source:

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

fine tuning agent memory mcp patterns

Core Idea

No LLM fine-tuning for adaptation

Low-cost continual learning

Memory-based online RL

Methodology

Memory-augmented MDP (M-MDP)

Soft Q-Learning for retrieval policy

Kernel-based Q-function estimation

Architecture

Planner-Executor framework

Case Memory for episodic traces

Subtask & Tool Memory

Key Features

Case-Based Reasoning (CBR)

Parametric & Non-Parametric Memory

Adaptive case selection

Performance

GAIA: Top-1 (87.88% Pass@3)

DeepResearcher: SOTA (66.6% F1)

HLE & SimpleQA: Strong results

Tools

Search engines & Web crawlers

Multimodal data processing

Code execution & Math

Source:

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Next Card

Deep Think with Confidence

No LLM fine-tuning for adaptation

Low-cost continual learning

Memory-based online RL

Memory-augmented MDP (M-MDP)

Soft Q-Learning for retrieval policy

Kernel-based Q-function estimation

Planner-Executor framework

Case Memory for episodic traces

Subtask & Tool Memory

Case-Based Reasoning (CBR)

Parametric & Non-Parametric Memory

Adaptive case selection

GAIA: Top-1 (87.88% Pass@3)

DeepResearcher: SOTA (66.6% F1)

HLE & SimpleQA: Strong results

Search engines & Web crawlers

Multimodal data processing

Code execution & Math

Large Concept Models: Language Modeling in a Sentence Representation Space

Related Cards

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

A Survey on Agentic Security: Applications, Threats and Defenses

Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

Learning Facts at Scale with Active Reading