LEANN: A Low-Storage Vector Index

Previous Card

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

LEANN is a novel, storage-efficient Approximate Nearest Neighbor (ANN) search index designed for resource-constrained personal devices. It addresses the high storage overhead of traditional embedding-based search by combining a compact graph structure with an efficient on-the-fly recomputation strategy. This approach significantly reduces index size while maintaining high search quality and low latency. ✨

Article Points:

LEANN: Low-storage ANN index for resource-constrained devices.

Reduces index size to <5% of raw data, 50x smaller than standard.

Combines compact graph with on-the-fly embedding recomputation.

Two-level search & dynamic batching minimize recomputation latency.

High-degree preserving graph pruning reduces metadata storage.

Achieves 90% top-3 recall in <2 seconds on real-world benchmarks.

Source:

LEANN: A Low-Storage Vector Index

rag embedding vector database

Problem Addressed

High ANN storage overhead

Impractical for personal devices

Need low storage, high accuracy, low latency

Core Solution

Storage-efficient ANN index

Compact graph structure

On-the-fly recomputation

Key Techniques

Efficient Recomputation

- Two-Level Search

- Dynamic Batching

Storage-Optimized Graph

- High-Degree Preserving Pruning

Performance

<5% raw data size

50x smaller storage

90% recall in <2s

Evaluation

RPJ-Wiki dataset

NQ, HotpotQA, TriviaQA, GPQA

NVIDIA A10, M1 Mac

Future Work

Leverage GPU advancements

Smaller embedding models

Storage-efficient index building

Source:

LEANN: A Low-Storage Vector Index

Next Card

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

High ANN storage overhead

Impractical for personal devices

Need low storage, high accuracy, low latency

Storage-efficient ANN index

Compact graph structure

On-the-fly recomputation

Efficient Recomputation

Storage-Optimized Graph

<5% raw data size

50x smaller storage

90% recall in <2s

RPJ-Wiki dataset

NQ, HotpotQA, TriviaQA, GPQA

NVIDIA A10, M1 Mac

Leverage GPU advancements

Smaller embedding models

Storage-efficient index building

GPT-5 prompting guide

Related Cards

Don’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

Key-value memory in the brain

Learning Facts at Scale with Active Reading

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving