Learning without training: The implicit dynamics of in-context learning
This work explores how Large Language Models (LLMs) achieve in-context learning (ICL) at inference time without explicit weight updates. The authors propose that a self-attention layer stacked with an MLP implicitly modifies the MLP's weights based on the context. They provide theoretical and experimental evidence that this mechanism, involving a low-rank weight update, explains LLMs' ability to learn from prompt examples. ✨
Article Points:
1
LLMs learn in-context without explicit weight updates.
2
Transformer blocks implicitly modify MLP weights via context.
3
Contextual blocks generalize transformer blocks for ICL.
4
Context implicitly updates neural network weights with low-rank matrix.
5
Token consumption drives implicit gradient descent learning dynamics.
6
Implicit learning dynamics resemble online gradient descent.
Learning without training: The implicit dynamics of in-context learning
Mechanism

LLMs learn without training

Self-attention + MLP interaction

Implicit weight modification

Low-rank weight update

Implicit Dynamics

Token consumption drives dynamics

Resembles gradient descent

Online gradient updates

Main Contributions

Contextual block concept

Explicit weight update formula

Implicit GD learning dynamics

Experiments

Verify Theorem 2.2

Convergence of delta W

Compare with finetuning

Limitations

Single transformer block analysis

Focus on first token output