Learning without training: The implicit dynamics of in-context learning

Previous Card

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

in-context learning transformer prompt engineering

This work explores how Large Language Models (LLMs) achieve in-context learning (ICL) at inference time without explicit weight updates. The authors propose that a self-attention layer stacked with an MLP implicitly modifies the MLP's weights based on the context. They provide theoretical and experimental evidence that this mechanism, involving a low-rank weight update, explains LLMs' ability to learn from prompt examples. ✨

Article Points:

LLMs learn in-context without explicit weight updates.

Transformer blocks implicitly modify MLP weights via context.

Contextual blocks generalize transformer blocks for ICL.

Context implicitly updates neural network weights with low-rank matrix.

Token consumption drives implicit gradient descent learning dynamics.

Implicit learning dynamics resemble online gradient descent.

Source:

Learning without training: The implicit dynamics of in-context learning

in-context learning transformer prompt engineering

Mechanism

LLMs learn without training

Self-attention + MLP interaction

Implicit weight modification

Low-rank weight update

Implicit Dynamics

Token consumption drives dynamics

Resembles gradient descent

Online gradient updates

Main Contributions

Contextual block concept

Explicit weight update formula

Implicit GD learning dynamics

Experiments

Verify Theorem 2.2

Convergence of delta W

Compare with finetuning

Limitations

Single transformer block analysis

Focus on first token output

Source:

Learning without training: The implicit dynamics of in-context learning

Next Card

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

LLMs learn without training

Self-attention + MLP interaction

Implicit weight modification

Low-rank weight update

Token consumption drives dynamics

Resembles gradient descent

Online gradient updates

Contextual block concept

Explicit weight update formula

Implicit GD learning dynamics

Verify Theorem 2.2

Convergence of delta W

Compare with finetuning

Single transformer block analysis

Focus on first token output

Retrieval-Augmented Reasoning with Lean Language Models

Related Cards

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

TEMPO: PROMPT-BASED GENERATIVE PRE-TRAINED TRANSFORMER FOR TIME SERIES FORECASTING

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Enhancing Retrieval-Augmented Generation: A Study of Best Practices