Building AI-powered applications in 2024 requires more bespoke engineering than building conventional software required in 2014. That sounds like an obvious claim — AI is newer, the tooling is less mature — but the magnitude of the gap is underappreciated. Most development tasks that would take hours with mature tooling still take days when LLMs are involved, because almost none of the infrastructure that makes software development tractable at scale exists yet for AI systems.
No reliable debugging tools for tracing why a model made a particular decision. No standard deployment pipelines for models that may respond differently to the same input across runs. No established patterns for testing systems whose outputs are inherently non-deterministic. No consensus on how to version the combination of model, prompt, and fine-tune that constitutes a "version" of an AI application. Greptile, which we backed in 2023, is working on making LLM-assisted code review genuinely reliable — and that company alone is addressing a problem that should not still be this hard.
The Testing Problem
Conventional software testing is grounded in determinism. Given the same input, a well-written function produces the same output. Test coverage can be measured objectively. Regression can be detected automatically. AI systems break this model entirely. A prompt change that improves average performance may degrade performance on a specific distribution of edge cases. Model updates that improve some capabilities may degrade others. The only way to know what changed is to run extensive evaluations — which is expensive, slow, and still mostly manual.
The companies building serious evaluation tooling are addressing the most urgent problem in AI developer experience today. Not the sexiest investment thesis, but an important one. Infrastructure companies rarely capture headlines; they tend to capture durable margins.
Documentation as Code
One of the most underappreciated dimensions of the AI developer tooling problem is documentation. The context that a model needs to do useful work — API documentation, internal knowledge, system context — is currently assembled by hand and injected via prompt or RAG. This is fragile and expensive to maintain. The companies building structured knowledge representation for AI systems — turning documentation into something a model can navigate reliably — are working on a foundational problem. Mintlify, which we backed in late 2022, started with developer documentation tooling and has been watching this space evolve from the inside.