Large Concept Models: Language Modeling in a Sentence Representation Space

Previous Card

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

This paper introduces Large Concept Models (LCMs), an architecture that processes and generates language at a higher, semantic "concept" level (sentences) rather than token-by-token. Unlike traditional LLMs, LCMs operate in a language- and modality-agnostic embedding space, demonstrating superior zero-shot generalization across many languages and improved scalability for long contexts. The research explores various training approaches, with diffusion-based models showing the most promising results. ✨

Article Points:

LCMs operate on explicit higher-level semantic representations ("concepts"), typically sentences, unlike token-based LLMs.

LCMs leverage SONAR embedding space for language- and modality-agnostic processing across 200 languages.

Diffusion-based LCMs (One-Tower, Two-Tower) outperform MSE regression and quantized variants.

LCMs demonstrate impressive zero-shot generalization to many languages, surpassing LLMs of comparable size.

LCMs offer better scalability for long contexts by operating on shorter concept sequences.

Explicit planning with LPCM significantly improves coherence in long-form text generation.

Source:

Large Concept Models: Language Modeling in a Sentence Representation Space

large concept model long context

Core Idea

Concept-level processing

Sentence representation

Language/modality-agnostic

Uses SONAR embeddings

Architectures

Base-LCM (MSE)

Diffusion LCM (One-Tower)

Diffusion LCM (Two-Tower)

Quantized LCM

Benefits

Zero-shot multilingual

Long context scalability

Modularity & extensibility

Explicit hierarchy

Evaluation

Pre-training metrics

Instruction-tuning

Summarization tasks

Outperforms LLMs in some tasks

Extensions

Summary expansion

Explicit planning (LPCM)

Challenges

Embedding space limitations

Concept granularity issues

Continuous vs. discrete modeling

Source:

Large Concept Models: Language Modeling in a Sentence Representation Space

Next Card

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Concept-level processing

Sentence representation

Language/modality-agnostic

Uses SONAR embeddings

Base-LCM (MSE)

Diffusion LCM (One-Tower)

Diffusion LCM (Two-Tower)

Quantized LCM

Zero-shot multilingual

Long context scalability

Modularity & extensibility

Explicit hierarchy

Pre-training metrics

Instruction-tuning

Summarization tasks

Outperforms LLMs in some tasks

Summary expansion

Explicit planning (LPCM)

Embedding space limitations

Concept granularity issues

Continuous vs. discrete modeling

Google’s Approach for Secure AI Agents: An Introduction

Related Cards

RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Titans: Learning to Memorize at Test Time

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models