Titans: Learning to Memorize at Test Time
This paper introduces Titans, a new family of neural architectures featuring a novel long-term neural memory module. This module learns to memorize historical context at test time, complementing attention's role as short-term memory for current context. Titans demonstrate superior performance over Transformers and modern linear recurrent models, effectively scaling to context windows larger than 2M tokens. ✨
Article Points:
1
Titans introduce a neural long-term memory module.
2
Neural memory learns to memorize historical context at test time.
3
Combines attention (short-term) with neural memory (long-term).
4
Neural memory uses gradient-based surprise, momentum, and forgetting.
5
Titans outperform Transformers and linear recurrent models.
6
Scales effectively to over 2M context window sizes.
Titans: Learning to Memorize at Test Time
Core Concept
Neural Long-Term Memory
Learns at test time
Deep non-linear memory
Surprise metric: gradient
Momentum & Forgetting
Architecture Variants
Memory as a Context (MAC)
Memory as a Gate (MAG)
Memory as a Layer (MAL)
Persistent Memory
Memory Mechanism
Short-term: Attention
Long-term: Neural Memory
Adaptive forgetting
Parallelizable training
Advantages
Scales to 2M+ context
Higher accuracy
Theoretically more expressive
Fast inference
Experimental Results
Language Modeling
Needle in Haystack
Time Series Forecasting
DNA Modeling
Outperforms baselines
Ablation Study