Retrieval-Augmented Reasoning with Lean Language Models
This technical report introduces a novel approach to combine reasoning and retrieval-augmented generation (RAG) within a single, lean language model architecture. The system utilizes fine-tuned Qwen2.5-Instruct models with a dense retriever, leveraging synthetic data and reasoning traces from frontier models. It aims to provide performant and privacy-preserving solutions deployable in resource-constrained or secure environments, demonstrating substantial gains in answer accuracy and consistency. ✨
Article Points:
1
Novel approach combines reasoning and RAG in a single lean LLM.
2
Addresses demand for performant, privacy-preserving local solutions.
3
Uses fine-tuned Qwen2.5-Instruct models with dense retrieval.
4
Leverages synthetic data and reasoning traces from frontier models.
5
Achieves substantial accuracy gains, approaching frontier performance.
6
Demonstrates feasibility for local deployment in resource-constrained settings.
Retrieval-Augmented Reasoning with Lean Language Models
Problem Statement
Large model limitations
Privacy & resource constraints
Integration challenge
Proposed Approach
Lean LLM architecture
Reasoning & RAG integration
Domain-specific fine-tuning
System Architecture
Lean Language Models
Retrieval System
Synthetic Data Generation
Reasoning Traces
Fine-tuning Process
Conversational Interface
Evaluation
NHS A-to-Z Case Study
Retrieval Performance
Accuracy Metrics
Comparison to Baselines
Distillation Impact
Key Findings
Substantial accuracy gains
Feasible local deployment
Outperforms general lean models
Comparable to frontier models
Future Directions