Top-P (Nucleus Sampling)
A sampling strategy that limits token selection to the smallest set covering a cumulative probability.
Full Definition
Top-p sampling, also called nucleus sampling, dynamically selects the vocabulary to sample from at each step: only tokens whose cumulative probability mass reaches the threshold p are considered. At top-p=0.9, the model samples exclusively from whichever tokens collectively account for 90% of the probability mass — a small set for high-confidence predictions, a larger set when the model is uncertain. This adapts vocabulary size to context, unlike top-k which uses a fixed count. Top-p and temperature address the same problem (diversity vs. focus) from different angles; it is generally recommended to tune one and leave the other at its default.
Examples
With top-p=0.95 on a factual prompt, the sampler might consider only 3–5 tokens; on an open-ended creative prompt it might consider 500+ tokens.
Setting top-p=0.1 to force the model to pick only its highest-confidence continuations when writing boilerplate code.
Apply this in your prompts
PromptITIN automatically uses techniques like Top-P (Nucleus Sampling) to build better prompts for you.