Dummy's Guide to Modern LLM Sampling Intro Knowledge
This guide introduces the fundamental concepts of Large Language Model (LLM) tokenization and various sampling methods used for text generation. It explains why sub-word tokens are preferred over words or letters and details numerous techniques like Temperature, Top-K, Top-P, and repetition penalties. The document also covers advanced adaptive sampling methods and emphasizes the critical impact of sampler order on output quality. ✨
Article Points:
1
LLMs use sub-word tokens for efficient text representation, balancing vocabulary size and handling rare words.
2
Sampling introduces controlled randomness to LLM text generation, preventing repetitive and deterministic outputs.
3
Temperature, Top-K, and Top-P are core sampling methods that modify token probability distributions.
4
Repetition penalties (Presence, Frequency, Repetition, DRY) prevent the model from repeating phrases or tokens.
5
Advanced adaptive samplers like Mirostat and Dynamic Temperature adjust parameters based on distribution properties.
6
The order in which sampling techniques are applied significantly impacts the final generated text's quality and coherence.
Dummy's Guide to Modern LLM Sampling Intro Knowledge
Tokenization

Why Sub-words?

BPE

SentencePiece

Sampling Basics

Logits & Softmax

Greedy vs. Sampling

Temperature

Repetition Control

Presence Penalty

Frequency Penalty

Repetition Penalty

DRY (n-grams)

Filtering Methods

Top-K

Top-P

Min-P

Top-A

Epsilon Cutoff

XTC

Adaptive Sampling

Top-N-Sigma

Tail-Free Sampling (TFS)

Eta Cutoff

Locally Typical

Dynamic Temperature

Advanced Strategies

Quadratic Sampling

Mirostat Sampling

Beam Search

Contrastive Search

Sampler Order