Self-Attention
An attention operation where a sequence attends to itself, allowing each token to gather context from all others.
Full Definition
Self-attention is a specific application of the attention mechanism where all three inputs — queries, keys, and values — are derived from the same sequence. Each token generates a query vector (what I'm looking for), a key vector (what I offer), and a value vector (what I contribute). The attention score between two tokens is the dot product of their query and key vectors, normalised by sequence length, then softmaxed into weights. The output for each token is a weighted sum of all value vectors. Self-attention enables every token to directly influence every other token in one operation, making transformers exceptionally good at capturing long-range dependencies in text.
Examples
In the sentence 'She took the trophy because she had earned it', self-attention links the second 'she' to 'She' at the sentence start across a long span.
A coding model using self-attention to link a function call to its definition several hundred tokens earlier in a long source file.
Apply this in your prompts
PromptITIN automatically uses techniques like Self-Attention to build better prompts for you.
Related Terms
Attention Mechanism
The core transformer operation that weighs the relevance of each token to every …
View →Transformer
The neural network architecture that underpins all modern large language models,…
View →Positional Encoding
A mechanism that injects token position information into transformer inputs, sin…
View →