Dummy's Guide to Modern LLM Sampling Intro Knowledge

Previous Card

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

tokenization sampling text generation prompt engineering

This guide introduces the fundamental concepts of Large Language Model (LLM) tokenization and various sampling methods used for text generation. It explains why sub-word tokens are preferred over words or letters and details numerous techniques like Temperature, Top-K, Top-P, and repetition penalties. The document also covers advanced adaptive sampling methods and emphasizes the critical impact of sampler order on output quality. ✨

Article Points:

LLMs use sub-word tokens for efficient text representation, balancing vocabulary size and handling rare words.

Sampling introduces controlled randomness to LLM text generation, preventing repetitive and deterministic outputs.

Temperature, Top-K, and Top-P are core sampling methods that modify token probability distributions.

Repetition penalties (Presence, Frequency, Repetition, DRY) prevent the model from repeating phrases or tokens.

Advanced adaptive samplers like Mirostat and Dynamic Temperature adjust parameters based on distribution properties.

The order in which sampling techniques are applied significantly impacts the final generated text's quality and coherence.

Source:

Dummy's Guide to Modern LLM Sampling Intro Knowledge

tokenization sampling text generation prompt engineering

Tokenization

Why Sub-words?

BPE

SentencePiece

Sampling Basics

Logits & Softmax

Greedy vs. Sampling

Temperature

Repetition Control

Presence Penalty

Frequency Penalty

Repetition Penalty

DRY (n-grams)

Filtering Methods

Top-K

Top-P

Min-P

Top-A

Epsilon Cutoff

XTC

Adaptive Sampling

Top-N-Sigma

Tail-Free Sampling (TFS)

Eta Cutoff

Locally Typical

Dynamic Temperature

Advanced Strategies

Quadratic Sampling

Mirostat Sampling

Beam Search

Contrastive Search

Sampler Order

Source:

Dummy's Guide to Modern LLM Sampling Intro Knowledge

Next Card

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Why Sub-words?

BPE

SentencePiece

Logits & Softmax

Greedy vs. Sampling

Temperature

Presence Penalty

Frequency Penalty

Repetition Penalty

DRY (n-grams)

Top-K

Top-P

Min-P

Top-A

Epsilon Cutoff

XTC

Top-N-Sigma

Tail-Free Sampling (TFS)

Eta Cutoff

Locally Typical

Dynamic Temperature

Quadratic Sampling

Mirostat Sampling

Beam Search

Contrastive Search

Sampler Order

DOM-based Extension Clickjacking: Your Password Manager Data at Risk

Related Cards

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

Deep Think with Confidence

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

GPT-5 prompting guide