Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Previous Card

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Memory Decoder is a novel plug-and-play pretrained memory designed for efficient domain adaptation of Large Language Models (LLMs). It employs a small transformer decoder that learns to imitate the behavior of external non-parametric retrievers. Once trained, Memory Decoder seamlessly integrates with any LLM sharing the same tokenizer, enhancing domain-specific performance without modifying original model parameters or incurring significant inference latency. ✨

Article Points:

Introduces Memory Decoder, a plug-and-play pretrained memory for LLMs.

Enables efficient domain adaptation without modifying original LLM parameters.

Replaces traditional non-parametric retrievers with a compact parametric model.

A single pretrained Memory Decoder integrates seamlessly across compatible LLMs.

Achieves superior performance with minimal inference latency compared to RAG.

Preserves general capabilities, avoiding catastrophic forgetting seen in DAPT.

Source:

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

rag memory fine tuning

Purpose

Efficient Domain Adaptation

Enhance LLMs

Mechanism

Small Transformer Decoder

Imitates Non-Parametric Retrievers

Distribution Alignment Loss

Advantages

Plug-and-Play Integration

No LLM Parameter Modification

Minimal Inference Latency

Cross-Model Adaptability

Preserves General Capabilities

Performance

Reduces Perplexity Significantly

Superior to DAPT & RAG

Excels in Knowledge-Intensive QA

Limitations

Pre-training Computational Cost

Requires Some Cross-Tokenizer Training

Source:

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Next Card

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Efficient Domain Adaptation

Enhance LLMs

Small Transformer Decoder

Imitates Non-Parametric Retrievers

Distribution Alignment Loss

Plug-and-Play Integration

No LLM Parameter Modification

Minimal Inference Latency

Cross-Model Adaptability

Preserves General Capabilities

Reduces Perplexity Significantly

Superior to DAPT & RAG

Excels in Knowledge-Intensive QA

Pre-training Computational Cost

Requires Some Cross-Tokenizer Training

Building a web search engine from scratch in two months with 3 billion neural embeddings

Related Cards

Enhancing Retrieval-Augmented Generation: A Study of Best Practices

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

On the Theoretical Limitations of Embedding-Based Retrieval

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion