Home/Guides/Tokens and Context Windows Explained
AI Models

Tokens and Context Windows Explained

Understand what tokens are in AI models, how context windows work, and why they matter for long-document tasks.

7 min read

Two concepts explain most of the practical limitations you'll hit when working with AI: tokens and context windows. Tokens determine cost and speed. Context windows determine what the model can 'see' at once. Once you understand these two things, you'll understand why AI charges you more for some tasks than others, why very long conversations sometimes produce worse answers, and why Claude handles that 40-page document while ChatGPT says it's too long.

What Tokens Are and Why They Matter

Tokens are the fundamental units that language models process — not exactly words, but word fragments. A simple word like 'cat' is typically one token. A longer word like 'understanding' might be two tokens (under + standing). A complex technical term or a word from an uncommon language might be multiple tokens. Punctuation, spaces, and special characters are also tokens. As a rough rule of thumb: 100 tokens ≈ 75 words in English. Tokens matter because AI models are priced per token (input + output), and they have a maximum number of tokens they can process in a single session. Understanding tokens helps you estimate costs and plan efficiently.

What a Context Window Is

The context window is the total number of tokens an AI model can process in a single session — the sum of your input (prompt + any pasted text) and its output (the response). Think of it as the model's working memory: everything within the context window is what the model can 'see' and reason about. Anything outside the window is inaccessible. If you have a 128,000-token context window and paste in a 100-page document (roughly 75,000 words), you've used about 100,000 tokens on the input alone, leaving 28,000 tokens for the model's response. Context window size is one of the most practically important differences between models.

Context Window Sizes Across Major Models

Context window sizes have grown dramatically. GPT-3.5 had a 4,096-token window — enough for a few paragraphs. GPT-4 Turbo has 128,000 tokens — enough for a full novel. Claude models support up to 200,000 tokens — enough for very large codebases or long research papers. Gemini 1.5 Pro supports up to 1 million tokens. These numbers change rapidly as models improve. The practical implication: tasks that required complex chunking and multiple queries a year ago can now be done in a single prompt. For most everyday tasks, any modern model's context window is large enough — the limits matter for long-document processing.

What Happens When You Exceed the Context Window

When your total tokens (input + output) approach or exceed the context window, earlier content is either truncated (cut off) or the model starts producing lower-quality responses as it struggles to maintain coherence across too much context. In a multi-turn conversation, this means the model 'forgets' earlier messages as the context fills up. In a document analysis task, the beginning of a long document may receive less attention than the end. The practical fix: for very long documents, break them into focused sections and query each separately. For long conversations, start fresh when output quality degrades.

Tokens, Pricing, and Cost Estimation

Most commercial AI APIs charge per token — separately for input tokens and output tokens, with output usually priced higher. Typical rates range from fractions of a cent per 1,000 tokens for efficient models to several cents per 1,000 tokens for the most capable frontier models. For casual use, this rarely matters. For applications processing thousands of requests, token efficiency becomes economically significant. A prompt with a 500-token system prompt, 200-token user input, and 400-token response uses 1,100 tokens total. At $0.01 per 1,000 tokens, that's $0.011 per request — manageable individually, but $110 per 10,000 requests. Optimizing prompt length matters at scale.

Prompt examples

✗ Weak prompt
Summarize this entire book for me. [pastes 150,000-word book]

Most models can't fit 150,000 words in their context window alongside a meaningful response. The model will either refuse or produce a summary based on an incomplete view of the content.

✓ Strong prompt
I'm going to give you a long document in sections. For each section, extract: 1) the main argument, 2) key supporting evidence, 3) any claims that require verification. I'll start with the first 5,000 words. Compile findings across all sections at the end when I say 'COMPILE.' Section 1: [first section]

This approach respects context window limits while extracting value from long content. Breaking the task into sections and requesting a final synthesis handles long documents reliably across any model.

Practical tips

  • Rough conversion: 1,000 tokens ≈ 750 words. Use this to estimate if your content fits in a model's context window.
  • When a long conversation produces degrading output, start a fresh one with a concise context summary — don't try to fix a full context window.
  • For document analysis tasks, choose a model with a context window large enough to hold the full document plus your instructions.
  • At scale, shorter prompts that produce equally good output are meaningfully cheaper — optimize token efficiency for high-volume tasks.

Continue learning

LLMs ExplainedHow ChatGPT WorksRAG Explained

PromptIt optimizes prompt structure for token efficiency — getting more from every interaction at lower cost.

PromptIt applies these prompt engineering principles automatically to build better prompts for your specific task.

✦ Try it free

More AI Models guides

How ChatGPT Works

A plain-language explanation of how ChatGPT processes your input and g

8 min · Read →

Claude vs ChatGPT: Key Differences

Compare Claude and ChatGPT across safety, context length, tone, and us

8 min · Read →

What is Google Gemini?

Learn what Google Gemini is, how it differs from other AI models, and

7 min · Read →

GPT-4 Guide: Features and Capabilities

Explore GPT-4's key features, multimodal capabilities, and how it comp

7 min · Read →
← Browse all guides