John Moore
by on February 20, 2026
26 views

Executive Summary: In January 2024, our startup closed a $12M Series A to build an AI agent for contract review. We rode the hype cycle, grew fast, and by Q4 2024 faced a brutal market correction. Revenue was flat, enterprise clients churned, and our next round fell through. This post-mortem analyzes the root causes—from product-market mismatch to the commoditization of LLMs—and documents our pivot to a sustainable model. Key lessons: trust is the only real moat, agents need humans in the loop, and unit economics matter more than vision.

1. The rise: how we raised on "agentic workflows"

January 2024, we closed the round. Twelve million dollars. Series A. Valuation based on the explosion of generative AI and our early traction with a legal-doc summarizer. The lead investor used the phrase "agentic workflows" nine times in the final pitch. We had no revenue—just a waitlist of 200 law firms and a demo that worked 80% of the time. But we had "AI" in our name, and in early 2024 that was enough.

Looking back, we were a classic "wrapper" startup. We used GPT-4 and some fine-tuning to parse legal text. Our differentiation was a clean UI and some prompt templates. At the time, investors weren't asking about moats. They were asking about TAM and growth velocity. And we delivered: waitlist grew to 2,000 by March. We hired fast, built a sales team, and launched in June.

 Market context (real 2024 data):

  • Gartner's July 2024 hype cycle placed generative AI at the "peak of inflated expectations" (Gartner, 2024).
  • McKinsey's global survey showed 65% of organizations were regularly using gen AI, double from 2023 (McKinsey, 2024).
  • But 40% of AI projects were predicted to fail by 2027 due to cost and value alignment (Gartner, July 2024).

We ignored the warning signs. Because everyone was raising. Because Anthropic released Claude 3.5 with better legal reasoning. Because it felt like the future.

2. The peak: early traction and hidden cracks

By August 2024, we had 15 paying customers—mostly mid-sized law firms and legal departments. They paid us $2k–$5k/month. Revenue hit $60k MRR. Investors started calling about the Series B. We felt invincible.

But the cracks were there. Customer support tickets piled up. The agent missed key clauses. It misinterpreted "indemnification" in three different ways. One client sent us a spreadsheet of 27 errors in a single 50-page contract. We blamed the model. We promised fine-tuning would fix it.

The Mercor Agentic Benchmark (Q3 2024) tested AI agents on real-world tasks, including contract review. The top agents scored below 70% accuracy on nuanced legal language (Mercor, 2024). We weren't alone. But our clients didn't care about benchmarks—they cared about errors.

 For a deeper look at agent failures, see this Interconnectd thread on agentic AI failures—real stories from other founders.

3. The correction: when the market stopped believing

October 2024. The mood shifted. Publicly, it was subtle. Privately, VCs started asking different questions. "What's your gross margin?" "How much do you spend on inference?" "What happens when OpenAI drops prices again?"

Then DeepSeek V3 launched in December 2024. It was competitive with GPT-4 at a fraction of the cost. Open-weight models became good enough that any startup could replicate basic functionality. The "wrapper" thesis imploded. TechCrunch called it "the commoditization of AI." Our lead investor started avoiding our calls.

By January 2025, we had 30 days of runway left. We laid off 40% of the team. It was brutal. But it forced us to actually think about what we were building.

"The market didn't collapse. It just stopped subsidizing companies that hadn't figured out unit economics." — Gary Fowler, 2025 prediction

4. Root cause analysis: why the product failed

We used the "5 Whys" method to understand why clients churned.

  • Why did clients cancel? The agent missed critical clauses and made errors.
  • Why did it miss clauses? The model wasn't fine-tuned on enough legal documents.
  • Why wasn't it fine-tuned better? We relied on generic LLMs to save costs.
  • Why did we rely on generic models? Because we prioritized speed over accuracy.
  • Why did we prioritize speed? Because we were obsessed with growth metrics, not client outcomes.

The deeper issue: we built for investors, not for users. We measured "contracts reviewed" not "errors avoided."

5. Market context: the commoditization of LLMs

The "DeepSeek moment" wasn't a single crash—it was the culmination of a trend. OpenAI, Anthropic, Google, and open-source models all drove prices down. By early 2025, inference costs had dropped 80% from 2023 levels (Statista, 2025).

For wrappers like us, that meant two things:

  1. Our gross margins compressed because we couldn't charge a premium for the same API calls.
  2. Competitors appeared overnight using the same base models.

The McKinsey 2024 survey noted that most companies were still experimenting—few had deployed at scale. We were part of the experiment wave, not the value wave.

 This Interconnectd discussion on human-driven AI captures the shift back to human-in-the-loop models.

6. The pivot: from agent to assistant

In January 2025, with 30 days left, we pivoted. We stopped selling an "autonomous agent." Instead, we built a human-in-the-loop platform:

  • AI does first-pass review, highlighting potential issues.
  • A human lawyer (our new "review team") validates and edits.
  • Client gets a reviewed document with human sign-off.

It's less scalable. But it works. Clients trust it. They're willing to pay $8k/month because they're buying assurance, not just speed.

We also changed our pricing: from per-seat to per-outcome. Clients pay per reviewed contract, capped monthly. That aligned our incentives with theirs.

7. Lessons learned: actionable takeaways

7.1 Trust is the only real moat

When models are commodities, trust becomes the differentiator. Can your client sleep at night? For us, that meant adding human review. For others, it might mean better security, transparency, or guarantees.

7.2 Unit economics matter more than vision

We ignored CAC, LTV, and gross margins for 18 months. Don't. Run the numbers monthly. HBR noted in 2024 that 70% of AI startups fail due to poor unit economics, not technology.

7.3 The "agent" label creates unrealistic expectations

Calling something an "agent" implies autonomy and reliability. In 2024–2025, that's a lie. Be honest about limitations. Underpromise, overdeliver.

7.4 Build for operators, not investors

Our best conversations were with operations leads who had actual problems. They didn't care about "agentic workflows." They cared about reducing contract review time without increasing risk. That's what we now sell.

For solo operators, this AI for solopreneurs thread has great examples of lean, human-in-the-loop setups.

8. Where we stand now (February 2025)

We're still here. Revenue is $110k MRR, growing 15% month-over-month. We're cash-flow positive. The valuation is down 70% from the peak, but we don't care. We're not raising—we're building.

The Gartner 2024 hype cycle predicted this: after the peak, a "trough of disillusionment." We're in it. But the trough is where real businesses get built.

9. Open questions

  • Will human-in-the-loop scale? Or will we hit a margin ceiling?
  • When will models improve enough to replace the human layer—and how do we adapt?
  • What does "management" of AI look like when the AI is partly autonomous?

We don't have answers. But we're asking them now, instead of ignoring them.


Parting thought

The bubble didn't pop. It deflated. And that's healthy. Now we get to find out which startups were actually solving problems.

If your product still works without the "AI" label, you'll survive. If it doesn't, maybe it's time to rethink.

Sources and further reading:

#AI, #StartupLife, #VentureCapital, #GenAI, #TechTrends, #AgenticAI, #BusinessStrategy, #DeepSeek, #TechBubble

Like (3)
Loading...
3