In early 2023, the practical limit for most deployed LLM applications was somewhere between 4,000 and 8,000 tokens — roughly 3,000 to 6,000 words. This was not a small constraint. It meant that most documents could not fit in a single call, that conversations had to be carefully summarized and compressed to avoid losing history, and that the retrieval infrastructure required to make up for the context limit was itself a major engineering investment.

By the end of 2024, Claude 3.5 Sonnet offers 200,000 tokens. Gemini 1.5 Pro offers one million. These are not quantitative improvements on the 2023 situation. They are qualitative shifts that change which problems AI can address and how.

What Changes with Long Context

The most immediate change is that entire documents, codebases, and conversation histories can fit in a single call. A lawyer can drop an entire contract into context. An engineer can provide the full codebase. A support agent can have complete account history. This eliminates a class of retrieval problems that were previously unavoidable and simplifies a class of application architectures significantly.

The second change is less obvious: long context makes certain agent architectures possible that were previously impractical. A planning agent that needs to reason over a large amount of background information — a repository of past decisions, a body of domain knowledge, the history of a project — can now do so without complex retrieval machinery. The planning loop becomes simpler; the relevant information is just present.

What Does Not Change

Longer context does not eliminate the retrieval problem. It shifts it. For very large corpora — a company's entire document base, a large codebase with millions of tokens — even 1M token windows are insufficient. Retrieval infrastructure remains important for these use cases. What changes is the threshold at which retrieval becomes necessary: many use cases that required retrieval in 2023 do not require it now, but the genuinely large-scale use cases still do.

The more subtle point is that context window size and retrieval quality are different dimensions. Larger context windows do not guarantee that the model will use all of the provided context well. In practice, models tend to give more attention to content near the beginning and end of the context window, with degraded performance in the middle. For applications where precision matters — legal review, financial analysis, code audit — long-context capability is a useful but not sufficient condition. The retrieval problem has not gone away; it has changed shape.