Retrieval-augmented generation has become one of the most discussed patterns in applied AI. It has also become one of the most overloaded terms. Teams that say they are "using RAG" are often doing very different things, facing very different trade-offs, and needing very different infrastructure. Clarifying the taxonomy helps understand what actually works and why.
At the most basic level, RAG means providing a model with relevant context at inference time by retrieving it from an external store. But the way you chunk documents, embed them, retrieve against them, and integrate the retrieved content into your prompt varies enormously, and those variations matter more than the shared label.
Naive RAG
The simplest form: embed a query, find the nearest chunks in a vector database, prepend them to a prompt. This works well for simple question-answering against a homogeneous document corpus where the questions map cleanly to contained chunks. It fails when questions require synthesizing information across multiple chunks, when document structure matters, or when the query does not naturally embed near the answer text.
Structured RAG
The more sophisticated pattern emerging in production systems: rather than treating retrieval as semantic nearest-neighbor search, structure the knowledge base in a way that supports structured queries — metadata filtering, relationship traversal, hierarchical context. Contextual AI, which we backed in 2023, is building retrieval infrastructure specifically designed for enterprise knowledge bases where naive chunking and embedding consistently fails to meet quality thresholds.
Agentic RAG
The frontier: rather than retrieving once at inference time, allow agents to retrieve iteratively as they work through a problem. The agent formulates a question, retrieves relevant context, incorporates it into its reasoning, identifies what else it needs, retrieves again. This pattern is significantly more powerful for complex multi-hop reasoning tasks and significantly more expensive to operate. The engineering challenge is managing the latency and cost of multiple retrieval steps within an agent loop. This is the direction the best production RAG systems are moving.