Previous Card
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
Dummy's Guide to Modern LLM Sampling Intro Knowledge
This guide introduces the fundamental concepts of Large Language Model (LLM) tokenization and various sampling methods used for text generation. It explains why sub-word tokens are preferred over words or letters and details numerous techniques like Temperature, Top-K, Top-P, and repetition penalties. The document also covers advanced adaptive sampling methods and emphasizes the critical impact of sampler order on output quality. ✨
Article Points:
1
LLMs use sub-word tokens for efficient text representation, balancing vocabulary size and handling rare words.
2
Sampling introduces controlled randomness to LLM text generation, preventing repetitive and deterministic outputs.
3
Temperature, Top-K, and Top-P are core sampling methods that modify token probability distributions.
4
Repetition penalties (Presence, Frequency, Repetition, DRY) prevent the model from repeating phrases or tokens.
5
Advanced adaptive samplers like Mirostat and Dynamic Temperature adjust parameters based on distribution properties.
6
The order in which sampling techniques are applied significantly impacts the final generated text's quality and coherence.
Dummy's Guide to Modern LLM Sampling Intro Knowledge
Tokenization
Why Sub-words?
BPE
SentencePiece
Sampling Basics
Logits & Softmax
Greedy vs. Sampling
Temperature
Repetition Control
Presence Penalty
Frequency Penalty
Repetition Penalty
DRY (n-grams)
Filtering Methods
Top-K
Top-P
Min-P
Top-A
Epsilon Cutoff
XTC
Adaptive Sampling
Top-N-Sigma
Tail-Free Sampling (TFS)
Eta Cutoff
Locally Typical
Dynamic Temperature
Advanced Strategies