Home/Glossary/Softmax
Technical

Softmax

A function that converts a vector of real numbers into a probability distribution summing to 1.

Full Definition

Softmax is the mathematical function applied to logits to produce token probabilities: it exponentiates each value, then divides by the sum of all exponentiated values, ensuring the outputs are positive and sum to 1. It is applied at two key points in transformers: (1) in the attention mechanism to normalise attention scores into weights, and (2) at the output layer to produce the next-token probability distribution over the vocabulary. The exponential operation amplifies differences, making the highest-scoring token even more dominant — a property that temperature modulates. Numerically stable implementations subtract the maximum logit before exponentiating to prevent overflow.

Examples

1

Logits [2.0, 1.0, 0.1] → after softmax: [0.659, 0.242, 0.099], assigning 66% probability to the top token.

2

Attention scores [8.3, 2.1, -1.5] normalised via softmax to [0.985, 0.013, 0.002], focusing 98.5% of attention on the first token.

Apply this in your prompts

PromptITIN automatically uses techniques like Softmax to build better prompts for you.

✦ Try it free

Related Terms

Logit

The raw, unnormalised score a model assigns to each vocabulary token before conv

View →

Temperature

A sampling parameter that controls the randomness and creativity of model output

View →

Attention Mechanism

The core transformer operation that weighs the relevance of each token to every

View →
← Browse all 100 terms