Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
PaperCoder is a multi-agent Large Language Model (LLM) framework designed to automatically generate functional code repositories from machine learning scientific papers. It operates through a structured three-stage pipeline: planning, analysis, and coding, each leveraging specialized LLM agents. The framework aims to enhance reproducibility and accelerate scientific progress by providing high-quality, faithful, and executable implementations where code is often unavailable. ✨
Article Points:
1
Multi-agent LLM framework for ML paper-to-code generation.
2
Transforms scientific papers into functional code repositories.
3
Three-stage pipeline: planning, analysis, and coding.
4
Achieves high-quality, faithful, and executable implementations.
5
Outperforms baselines on Paper2Code and PaperBench benchmarks.
6
Human evaluations confirm practical utility and reproducibility.
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Purpose

Transforms ML papers to code repositories

Addresses code unavailability in ML research

Enhances reproducibility & accelerates science

Framework Stages

Planning

- Overall Plan: High-level roadmap
- Architecture Design: System diagrams, file list
- Logic Design: File dependencies, execution order
- Configuration File: Hyperparameters, settings

Analysis: Detailed file-level implementation specifics

Coding: Modular, dependency-aware code generation

Evaluation

Benchmarks

- Paper2Code: ICML, NeurIPS, ICLR 2024 papers
- PaperBench Code-Dev: ICML 2024 papers

Metrics

- Model-Based: Reference-based & Reference-free scores
- Human Evaluation: Original paper authors' rankings
- Executability: Minimal modifications for successful run
Performance

Outperforms Baselines: ChatDev, MetaGPT, naive methods

High Quality: Faithful & comprehensive implementations

Practical Utility: 85% helpfulness by human judges

Executable Code: Avg 0.48% lines modified for run