Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
PaperCoder is a multi-agent Large Language Model (LLM) framework designed to automatically generate functional code repositories from machine learning scientific papers. It operates through a structured three-stage pipeline: planning, analysis, and coding, each leveraging specialized LLM agents. The framework aims to enhance reproducibility and accelerate scientific progress by providing high-quality, faithful, and executable implementations where code is often unavailable. ✨
Article Points:
1
Multi-agent LLM framework for ML paper-to-code generation.
2
Transforms scientific papers into functional code repositories.
3
Three-stage pipeline: planning, analysis, and coding.
4
Achieves high-quality, faithful, and executable implementations.
5
Outperforms baselines on Paper2Code and PaperBench benchmarks.
6
Human evaluations confirm practical utility and reproducibility.
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Purpose
Transforms ML papers to code repositories
Addresses code unavailability in ML research
Enhances reproducibility & accelerates science
Framework Stages
Planning
- Overall Plan: High-level roadmap
- Architecture Design: System diagrams, file list
- Logic Design: File dependencies, execution order
- Configuration File: Hyperparameters, settings
Analysis: Detailed file-level implementation specifics
Coding: Modular, dependency-aware code generation
Evaluation
Benchmarks
- Paper2Code: ICML, NeurIPS, ICLR 2024 papers
- PaperBench Code-Dev: ICML 2024 papers
Metrics
- Model-Based: Reference-based & Reference-free scores
- Human Evaluation: Original paper authors' rankings
- Executability: Minimal modifications for successful run
Performance