The State of AI Tooling — Hearthstone Ventures

We have been building our thesis around AI infrastructure since we started the fund, and I want to record our current view of the tooling ecosystem — what exists, what is clearly missing, and where we think the most important companies will be built. Writing this in November 2022 means we are writing before whatever comes next from OpenAI; I note this because these views may need updating in the near future.

The overall assessment: the AI developer tooling ecosystem is at a state roughly comparable to web development tooling circa 1998. Core capabilities exist. The developer experience for using them is poor. Almost everything that makes software development tractable at scale — testing frameworks, deployment pipelines, monitoring infrastructure, debugging tools — is either absent or in its earliest prototype form. Every team building AI-powered products is largely reinventing the same infrastructure from scratch.

What Exists

Model APIs from OpenAI, Anthropic, and a growing number of others. These are well-designed and reliable. The basic inference capability is not the gap. Fine-tuning tooling: rudimentary but functional for teams with the ML expertise to use it. Vector databases: Pinecone, Weaviate, and others are providing the embedding storage layer that RAG-style applications need. These are useful and the category is growing quickly.

What is essentially absent: serious tooling for testing and evaluation of LLM applications. Monitoring and observability beyond raw latency and error rates. Deployment pipelines that account for the non-determinism of LLM systems. Prompt management systems that treat prompts with the same rigor as code. These are the gaps that produce the most friction for teams trying to build reliable AI products.

Where We Are Investing

Our current portfolio reflects our bets on which gaps will be filled by standalone companies rather than consolidated into model providers. We backed Langbase to build the composability and workflow layer for LLM pipelines. We backed Mintlify to improve the documentation infrastructure that AI systems increasingly need to reason about. We backed Vellum to build the prompt engineering and evaluation infrastructure that production LLM applications require. Each of these is addressing a gap that is clearly present and clearly not going to be filled by the model providers themselves — the infrastructure layer requires specialization that general-purpose model APIs cannot provide.