Home/Glossary/Prompt Compression
Prompting

Prompt Compression

Reducing prompt length while preserving the information needed for accurate responses.

Full Definition

Prompt compression techniques shrink long prompts — particularly retrieved documents or conversation histories — to fit within a model's context window or to reduce token costs. Methods include extractive compression (removing irrelevant sentences), abstractive compression (summarising), selective retrieval (only including the most relevant chunks), and learned compression (using a smaller model to encode long context into a compact representation). As context windows grow, compression matters less for fitting text in, but remains critical for cost control and for mitigating the 'lost in the middle' phenomenon where models underweight centrally positioned content.

Examples

1

Using a summarisation model to condense a 50-page PDF to 2,000 tokens before passing it to a question-answering model.

2

Removing boilerplate and repeated content from a conversation history before appending it to a new prompt.

Apply this in your prompts

PromptITIN automatically uses techniques like Prompt Compression to build better prompts for you.

✦ Try it free

Related Terms

Context Window

The maximum number of tokens a model can process in a single input-output intera

View →

RAG (Retrieval-Augmented Generation)

Augmenting model responses by retrieving relevant documents from an external kno

View →

Token

The basic unit of text a language model processes, roughly corresponding to a wo

View →
← Browse all 100 terms