Positional Encoding
A mechanism that injects token position information into transformer inputs, since attention is order-agnostic.
Full Definition
Transformers process all tokens in parallel and have no built-in notion of sequence order — without positional encoding, 'dog bites man' and 'man bites dog' would produce identical attention patterns. Positional encodings add position-specific signals to token embeddings. The original transformer used fixed sinusoidal functions; modern models use learned absolute positions or relative positional encodings. RoPE (Rotary Position Embedding), used in Llama and GPT-NeoX, applies a rotation to query and key vectors that encodes relative distance, scales elegantly to long contexts, and is critical for extending context windows via techniques like YaRN and LongRoPE.
Examples
The original 'Attention Is All You Need' sinusoidal positional encoding: PE(pos, 2i) = sin(pos / 10000^(2i/d_model)).
RoPE encoding allowing Llama 3 to be extended from an 8k training context to 128k at inference by adjusting the rotation frequency scaling factor.
Apply this in your prompts
PromptITIN automatically uses techniques like Positional Encoding to build better prompts for you.