LLM Observability Is Underrated — Hearthstone Ventures

When we survey the teams building LLM-based products in 2023, one consistent pattern stands out: teams with robust observability infrastructure are shipping better products faster, and teams without it are flying blind. This sounds obvious stated plainly. In practice, LLM observability is consistently the last thing teams invest in and the first thing they wish they had invested in earlier.

The reason is that observability for LLM systems is not the same as observability for conventional software, and the tooling built for conventional systems does not transfer cleanly. A trace of a database query tells you how long it took and whether it succeeded. A trace of an LLM call tells you how long it took and whether it returned — but it tells you nothing about whether the output was any good. The quality dimension is absent from conventional observability, and adding it requires a new set of tools and patterns.

What Good LLM Observability Looks Like

The teams that have invested in LLM observability tend to capture three categories of data about every inference. First, the operational metrics that conventional observability covers: latency, token count, cost, error rate. These are table stakes. Second, the content metadata: a log of the prompt and response, structured so that specific outputs can be queried against. Third — and this is the hard part — quality signals: some mechanism that tells you whether a given output was good or bad.

The third category can be automated, human-annotated, or derived from downstream signals. Teams that only have the first two categories can debug operational failures but are blind to quality degradation. A model update that makes GPT-4 20% cheaper but 5% worse on your specific task will show up as a cost decrease and look like good news — unless you have quality signal.

The Investment Opportunity

We have been watching the observability space carefully since we started the fund. The Portkey team, which we backed in June 2023, is building the gateway and observability layer for LLM API calls — giving teams the visibility they need into their LLM usage without having to build that instrumentation themselves. The core insight is that the gateway is the right place for observability infrastructure: every call passes through it, so you can capture everything without requiring any changes to application code. We expect this category to mature quickly as the teams building LLM applications realize how much they are missing without it.