Titans: Learning to Memorize at Test Time
This paper introduces Titans, a new family of neural architectures featuring a novel long-term neural memory module. This module learns to memorize historical context at test time, complementing attention's role as short-term memory for current context. Titans demonstrate superior performance over Transformers and modern linear recurrent models, effectively scaling to context windows larger than 2M tokens. ✨
Article Points:
1
Titans introduce a neural long-term memory module.
2
Neural memory learns to memorize historical context at test time.
3
Combines attention (short-term) with neural memory (long-term).
4
Neural memory uses gradient-based surprise, momentum, and forgetting.
5
Titans outperform Transformers and linear recurrent models.
6
Scales effectively to over 2M context window sizes.
Titans: Learning to Memorize at Test Time
Core Concept

Neural Long-Term Memory

Learns at test time

Deep non-linear memory

Surprise metric: gradient

Momentum & Forgetting

Architecture Variants

Memory as a Context (MAC)

Memory as a Gate (MAG)

Memory as a Layer (MAL)

Persistent Memory

Memory Mechanism

Short-term: Attention

Long-term: Neural Memory

Adaptive forgetting

Parallelizable training

Advantages

Scales to 2M+ context

Higher accuracy

Theoretically more expressive

Fast inference

Experimental Results

Language Modeling

Needle in Haystack

Time Series Forecasting

DNA Modeling

Outperforms baselines

Ablation Study

Weight decay crucial

Momentum important

Convolution beneficial

Persistent memory helps

Deep memory effective