Less is More: Recursive Reasoning with Tiny Networks

Previous Card

A Survey on Agentic Security: Applications, Threats and Defenses

Hierarchical Reasoning Model (HRM) uses two small neural networks for recursive reasoning, outperforming Large Language Models (LLMs) on hard puzzle tasks. This paper proposes Tiny Recursive Model (TRM), a simpler and more generalized approach using a single tiny network with only two layers. TRM achieves significantly higher generalization than HRM and LLMs with less than 0.01% of the parameters, demonstrating the effectiveness of simplified recursive reasoning. ✨

Article Points:

TRM simplifies recursive reasoning with a single tiny network.

TRM outperforms LLMs and HRM on hard puzzle tasks with fewer parameters.

TRM eliminates complex fixed-point theorems and biological justifications.

TRM's simplified ACT requires only one forward pass during training.

"Less is more": 2-layer networks and single network improve generalization.

Exponential Moving Average (EMA) enhances stability and generalization.

Source:

Less is More: Recursive Reasoning with Tiny Networks

feature engineering

Core Idea

Recursive reasoning with single tiny network

Progressively improves answer

Advantages over HRM

Simpler design, no complex theorems

Higher generalization with fewer parameters

Simplified Adaptive Computational Time ACT

Key Design Choices

Single 2-layer network

Reinterpretation of latent features y, z

Exponential Moving Average EMA

Attention-free architecture for fixed context

Performance

Beats LLMs on Sudoku, Maze, ARC-AGI

Achieves SOTA on puzzle benchmarks

Limitations & Future Work

Not generative, only deterministic answers

Scaling laws needed for optimal parameters

Why recursion helps remains unexplained

Failed Ideas

Mixture-of-Experts MoEs

Partial backpropagation

Removing ACT entirely

Weight tying input/output

TorchDEQ for fixed-point iteration

Source:

Less is More: Recursive Reasoning with Tiny Networks

Next Card

A Survey on Agentic Security: Applications, Threats and Defenses

Recursive reasoning with single tiny network

Progressively improves answer

Simpler design, no complex theorems

Higher generalization with fewer parameters

Simplified Adaptive Computational Time ACT

Single 2-layer network

Reinterpretation of latent features y, z

Exponential Moving Average EMA

Attention-free architecture for fixed context

Beats LLMs on Sudoku, Maze, ARC-AGI

Achieves SOTA on puzzle benchmarks

Not generative, only deterministic answers

Scaling laws needed for optimal parameters

Why recursion helps remains unexplained

Mixture-of-Experts MoEs

Partial backpropagation

Removing ACT entirely

Weight tying input/output

TorchDEQ for fixed-point iteration

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Related Cards

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers