ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Previous Card

PERSONA VECTORS: MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS

ScoreFlow is a high-performance framework designed to optimize large language model (LLM) multi-agent workflows. It addresses limitations of existing methods by employing efficient gradient-based optimization in a continuous space. The framework introduces Score-DPO, a novel direct preference optimization method that incorporates quantitative evaluation feedback, leading to an 8.2% performance improvement over baselines and enabling smaller models to surpass larger ones with reduced costs. ✨

Article Points:

ScoreFlow: Automated, adaptive framework for LLM agent workflow generation.

Score-DPO: Novel preference optimization method using quantitative evaluation scores.

Achieves 8.2% improvement over baselines across diverse tasks.

Enables smaller LLMs to outperform larger models with lower inference costs.

Leverages efficient gradient-based optimization in a continuous space.

Uses code as a flexible representation for workflow search space.

Source:

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

agent fine tuning

Problem Addressed

Inflexible existing methods

Limited adaptability & scalability

High manual effort for workflows

Proposed Solution(ScoreFlow)

Automated, adaptive framework

Gradient-based optimization

Code as workflow representation

Cost-efficient with open-source LLMs

Key Component(Score-DPO)

Novel DPO variant

Incorporates quantitative scores

Enhances efficiency & stability

Addresses score variance & inaccuracies

Methodology

Iterative workflow generation

Evaluation score feedback

Fine-tuning generator with Score-DPO

Experimental Results

8.2% improvement over baselines

Outperforms on 6 benchmarks

Smaller models surpass larger ones

Contributions

ScoreFlow framework

Score-DPO optimization method

Extensive evaluations & robustness

Source:

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Next Card

PERSONA VECTORS: MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS

Inflexible existing methods

Limited adaptability & scalability

High manual effort for workflows

Automated, adaptive framework

Gradient-based optimization

Code as workflow representation

Cost-efficient with open-source LLMs

Novel DPO variant

Incorporates quantitative scores

Enhances efficiency & stability

Addresses score variance & inaccuracies

Iterative workflow generation

Evaluation score feedback

Fine-tuning generator with Score-DPO

8.2% improvement over baselines

Outperforms on 6 benchmarks

Smaller models surpass larger ones

ScoreFlow framework

Score-DPO optimization method

Extensive evaluations & robustness

Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Related Cards

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

A Survey on Agentic Security: Applications, Threats and Defenses

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models