Learning without training: The implicit dynamics of in-context learning
This work explores how Large Language Models (LLMs) achieve in-context learning (ICL) at inference time without explicit weight updates. The authors propose that a self-attention layer stacked with an MLP implicitly modifies the MLP's weights based on the context. They provide theoretical and experimental evidence that this mechanism, involving a low-rank weight update, explains LLMs' ability to learn from prompt examples. ✨
Article Points:
1
LLMs learn in-context without explicit weight updates.
2
Transformer blocks implicitly modify MLP weights via context.
3
Contextual blocks generalize transformer blocks for ICL.
4
Context implicitly updates neural network weights with low-rank matrix.
5
Token consumption drives implicit gradient descent learning dynamics.
6
Implicit learning dynamics resemble online gradient descent.
Learning without training: The implicit dynamics of in-context learning
Mechanism
LLMs learn without training
Self-attention + MLP interaction
Implicit weight modification
Low-rank weight update
Implicit Dynamics
Token consumption drives dynamics
Resembles gradient descent
Online gradient updates
Main Contributions
Contextual block concept
Explicit weight update formula
Implicit GD learning dynamics
Experiments
Verify Theorem 2.2
Convergence of delta W
Compare with finetuning
Limitations