What Belongs in the Context Window
The context window should contain only information that is directly relevant to the task at hand. This seems obvious, but it's commonly violated: people include large document dumps, entire conversation histories, or background documents that are tangentially related, on the theory that 'more context is better.' It isn't. Irrelevant content dilutes the model's attention and can actively degrade performance — models weight the entire context when generating, and noise in the context produces noise in the output. Apply a strict filter: for each document or section you're considering including, ask 'does this directly help the model complete the task?' If the answer is 'maybe' or 'it provides background,' cut it.
The Lost-in-the-Middle Problem
Research on language model attention patterns has consistently shown a 'lost in the middle' effect: models pay more attention to content at the beginning and end of the context window, and less to content buried in the middle. This has a direct, practical implication: critical instructions, key documents, and the most important pieces of context should go at the beginning or end of the context window, not in the middle. If you have 10 documents to include and one is clearly the most important, don't put it in position 5 — put it first or last. This positioning effect is more pronounced in very long contexts and less important in short ones.
Structuring Long Contexts for Clarity
Unstructured walls of text are harder for models to navigate than clearly sectioned, labeled content. Use explicit structure to demarcate different types of context. For instructions: place in a clearly marked section at the top. For reference documents: use headers with document title and source. For conversation history: use clear speaker labels. For code or data: use code blocks or structured formatting. Many models respond well to XML-style tags for context demarcation: <instructions>...</instructions>, <documents>...</documents>, <user_query>...</user_query>. This explicit structure reduces the cognitive cost of navigating the context and improves the model's ability to distinguish between different types of input.
Few-Shot Examples in Long Contexts
Few-shot examples (examples of the task done well) are among the most valuable things to include in a long context — but their placement matters. Research shows that examples placed immediately before the final task instruction are most effective. For long contexts with many examples, distribute them in a way that creates a clear pattern the model can follow, and always include at least one example that closely resembles the specific task you're asking the model to do. Diverse examples covering the range of task variations are more valuable than multiple examples of the same simple case.
Context Management for Repeated Sessions
For applications where you're running many completions against a large fixed context (e.g., Q&A over a document, analysis of a codebase), front-load the context with high-quality instructions and the most important reference material. As conversations grow, be selective about conversation history inclusion — not every prior exchange is relevant to the current task. Truncate or summarize older history when the context window starts filling up, keeping only the exchanges directly relevant to the current step. Summarization ('the user was asking about X and we established Y') preserves the important state while reducing token consumption.