Large Concept Models: Language Modeling in a Sentence Representation Space
This paper introduces Large Concept Models (LCMs), an architecture that processes and generates language at a higher, semantic "concept" level (sentences) rather than token-by-token. Unlike traditional LLMs, LCMs operate in a language- and modality-agnostic embedding space, demonstrating superior zero-shot generalization across many languages and improved scalability for long contexts. The research explores various training approaches, with diffusion-based models showing the most promising results. ✨
Article Points:
1
LCMs operate on explicit higher-level semantic representations ("concepts"), typically sentences, unlike token-based LLMs.
2
LCMs leverage SONAR embedding space for language- and modality-agnostic processing across 200 languages.
3
Diffusion-based LCMs (One-Tower, Two-Tower) outperform MSE regression and quantized variants.
4
LCMs demonstrate impressive zero-shot generalization to many languages, surpassing LLMs of comparable size.
5
LCMs offer better scalability for long contexts by operating on shorter concept sequences.
6
Explicit planning with LPCM significantly improves coherence in long-form text generation.
Large Concept Models: Language Modeling in a Sentence Representation Space
Core Idea

Concept-level processing

Sentence representation

Language/modality-agnostic

Uses SONAR embeddings

Architectures

Base-LCM (MSE)

Diffusion LCM (One-Tower)

Diffusion LCM (Two-Tower)

Quantized LCM

Benefits

Zero-shot multilingual

Long context scalability

Modularity & extensibility

Explicit hierarchy

Evaluation

Pre-training metrics

Instruction-tuning

Summarization tasks

Outperforms LLMs in some tasks

Extensions

Summary expansion

Explicit planning (LPCM)

Challenges

Embedding space limitations

Concept granularity issues

Continuous vs. discrete modeling