Memory as Infrastructure — Hearthstone Ventures

The most common mistake I see when people talk about agent memory is treating it as a feature. Memory as a checkmark: the agent remembers what you told it last time. Memory as a selling point in a product overview slide. Memory as something you implement by stuffing previous conversation turns into the system prompt and hoping for the best.

Memory is not a feature. It is infrastructure. And it deserves to be understood and built with the same rigor that teams apply to any other infrastructure component — which is to say, starting with the failure modes and working backwards to the design.

Types of Agent Memory

There are at least four meaningfully different types of memory an agent might need: working memory (what is relevant to the current task), episodic memory (what happened in past interactions with this user or about this topic), semantic memory (general knowledge about the world and the user), and procedural memory (how to do specific classes of tasks). Each type has different retrieval semantics, different consistency requirements, and different tradeoffs between storage cost and retrieval speed.

Conflating these types is the root of most production memory failures. The agent that loses track of the user's stated preferences because it has been three sessions since they were mentioned is experiencing episodic memory failure. The agent that confidently gives outdated information about a topic it knew about last year is experiencing semantic memory staleness. The agent that forgets how to do something it has done reliably for months after a model update is experiencing procedural memory fragility. These are different problems that require different solutions.

The Retrieval Problem

Good memory is not about storage. It is about retrieval. The hard engineering problem is not persisting information — that is relatively tractable. The hard problem is knowing what to retrieve, when, and with what confidence, given a new query that may be semantically distant from the stored memory but contextually relevant.

The teams that are building production agent memory systems are discovering that naive vector similarity retrieval is necessary but not sufficient. You also need temporal weighting (recent memories are usually more relevant than old ones, but not always), context-sensitivity (what matters depends on what the agent is currently doing), and conflict resolution (what happens when two stored memories contradict each other). Getting these right is a genuine infrastructure engineering problem. We are early in understanding how to solve it well.