Why Language Models Hallucinate

Previous Card

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy

This paper argues that large language models hallucinate because their training and evaluation procedures reward guessing over admitting uncertainty. It analyzes the statistical origins of hallucinations during pretraining, linking them to binary classification errors, and explains their persistence due to misaligned post-training benchmarks. The authors propose modifying existing evaluation scoring to encourage more trustworthy AI systems by penalizing overconfident falsehoods. ✨

Article Points:

LM hallucinations stem from training and evaluation rewarding guessing over uncertainty.

Pretraining errors are statistical, analogous to binary classification misclassifications.

Hallucinations persist because post-training evaluations penalize uncertainty and abstention.

Modifying existing benchmarks to reward uncertainty is crucial for effective mitigation.

Explicit confidence targets in evaluations can foster trustworthy LM behavior.

Arbitrary facts, poor models, and GIGO contribute to pretraining errors.

Source:

Why Language Models Hallucinate

hallucination evaluation

Pretraining Origins

Statistical Errors

- Binary Classification Analogy

- Epistemic Uncertainty

- Arbitrary Facts (Singletons)

Poor Models

- Limited Representation

- N-gram Models Example

Additional Factors

- Computational Hardness

- Distribution Shift

- GIGO (Garbage In, Garbage Out)

Post-training Persistence

Evaluation Misalignment

- Binary Grading Penalizes Uncertainty

- Guessing Maximizes Score

Socio-technical Problem

- Need for Benchmark Modification

- Existing Hallucination Evals Insufficient

Mitigation Strategies

Explicit Confidence Targets

- Penalties for Incorrect Answers

- Behavioral Calibration

Integrate into Mainstream Evals

- Adjust Scoring of Benchmarks

- Reward Uncertainty Expressions

Source:

Why Language Models Hallucinate

Next Card

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy

Statistical Errors

Poor Models

Additional Factors

Evaluation Misalignment

Socio-technical Problem

Explicit Confidence Targets

Integrate into Mainstream Evals

Extract-0: A SPECIALIZED LANGUAGE MODEL FOR DOCUMENT INFORMATION EXTRACTION

Related Cards

PERSONA VECTORS: MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS

Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy