Lost in the Middle: How Language Models Use Long Contexts

Previous Card

How to Hack a Web3 Wallet (Legally): A Full-Stack Pentesting Guide

This paper investigates how language models utilize long input contexts for tasks like multi-document question answering and key-value retrieval. It reveals that performance significantly degrades when relevant information is placed in the middle of the context, exhibiting a U-shaped curve (primacy and recency bias). This indicates that current LMs do not robustly leverage information across long inputs, even for models designed for extended contexts. ✨

Article Points:

LM performance degrades when relevant info is in the middle of long contexts.

Models show a U-shaped performance curve: best at start/end, worst in middle.

Extended-context models don't necessarily use long contexts more effectively.

Encoder-decoder models are more robust within their training sequence length.

Query-aware contextualization improves key-value retrieval, but less so for QA.

Instruction fine-tuning doesn't cause the U-shape, but can slightly mitigate bias.

Source:

Lost in the Middle: How Language Models Use Long Contexts

rag

Performance Degradation

Significant drop in middle

Not robust to position

Positional Biases

U-shaped curve

Primacy bias (start)

Recency bias (end)

Architectural Effects

Encoder-decoder more robust

Within training length

Decoder-only struggle

Contextualization

Query-aware helps KV retrieval

Minimal impact on QA

Fine-tuning Influence

U-shape persists

Slightly mitigates bias

Larger models show U-shape

Practical Trade-offs

More context not always better

Reranking documents

Ranked list truncation

Source:

Lost in the Middle: How Language Models Use Long Contexts

Next Card

How to Hack a Web3 Wallet (Legally): A Full-Stack Pentesting Guide

Significant drop in middle

Not robust to position

U-shaped curve

Primacy bias (start)

Recency bias (end)

Encoder-decoder more robust

Within training length

Decoder-only struggle

Query-aware helps KV retrieval

Minimal impact on QA

U-shape persists

Slightly mitigates bias

Larger models show U-shape

More context not always better

Reranking documents

Ranked list truncation

AI’s Security Crisis: Why Your Assistant Might Betray You

Related Cards

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

On the Theoretical Limitations of Embedding-Based Retrieval

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval