Deep Think with Confidence
Deep Think with Confidence (DeepConf) is a method designed to enhance Large Language Model (LLM) reasoning efficiency and performance at test time. It utilizes model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. DeepConf requires no additional model training and significantly improves accuracy while substantially reducing computational overhead. ✨
Article Points:
1
DeepConf boosts LLM reasoning efficiency and performance at test time.
2
Leverages model-internal confidence signals to filter low-quality traces.
3
Operates in offline and online modes, requiring no additional training.
4
Introduces local confidence measures (group, bottom 10%, tail) for fine-grained assessment.
5
Online mode dynamically terminates unpromising traces, saving tokens.
6
Achieves up to 99.9% accuracy and 84.7% token reduction on benchmarks.
Deep Think with Confidence
Purpose(Enhances LLM Reasoning)

Efficiency

Performance

Mechanism(Confidence-Aware Filtering)

Filters low-quality traces

Uses internal confidence

Modes(Offline & Online Operation)

Offline: Post-generation filtering

Online: Real-time early stopping

Confidence Metrics(Local & Global Signals)

Group Confidence

Bottom 10% Group Confidence

Tail Confidence

Lowest Group Confidence

Key Results(Accuracy & Efficiency Gains)

Up to 99.9% accuracy

Up to 84.7% token reduction

No extra training needed

Implementation(vLLM Integration)

Minimal vLLM edits

Adaptive sampling

Offline warmup phase