Skip to main content
Home · Insights · Testing Software With Agentic AI
June 13, 2025

Testing Software With Agentic AI

Agentic AI is revolutionizing how software is built—but without the right experience and context, it often produces brittle, misguided code.

In this post, I’ll walk through a real-world example using Cursor IDE, Claude 4 Sonnet, and MCP servers to automate test coverage for a Firebase Functions project. More importantly, I’ll explain why this success wasn’t just about AI—it was about experience.

🚀 The Experiment: Agent Mode Writes 17 Tests from One Prompt

Here’s the only instruction I gave to the AI in Cursor’s agent mode:

add test coverage for @userMigrations.ts @testing-rules.mdc @project-structure.mdc

Within 9 minutes, the AI:

  • Wrote 17 Jest tests covering header validation, API key checking, user migrations, and error handling
  • Mocked Firestore query chains properly (including .where().where().get())
  • Followed project-specific testing patterns
  • Produced 100% passing tests with no regressions

On the surface, that sounds like magic.

But here’s the truth: the AI succeeded because of the environment it was placed in—and because it was guided by someone who knew what good software looks like.

🎯 Why Experience Matters in AI-Driven Development

AI agents don’t know how to write maintainable code by default. They need direction—and only senior engineers know how to provide it.

1. Environment and Context Make or Break AI Success

Agent mode worked because the project had:

  • Well-documented code structure
  • Model Context Protocol (MCP) servers providing memory and step-by-step reasoning
  • .mdc files describing testing rules and project organization
  • Clear architectural boundaries and existing mocking conventions

Without that foundation, the AI would fail. Poor context = poor results.

2. Experience Is Required to Course-Correct

Even in this success story, the AI hit several roadblocks:

  • Type errors and Firestore mock issues
  • Misunderstandings around try/catch blocks and async error flow
  • Unexpected behavior with malformed inputs

As a senior developer, I knew how to debug these problems, adjust expectations, and refine the test logic accordingly. A junior dev—or worse, a non-technical founder—wouldn’t have caught these nuances.

⏱️ Real Productivity Gains: 4 Hours of Work in 9 Minutes

This wasn’t just a coding experiment. It delivered real productivity:

  • Compressed 2–4 hours of manual test writing into 9 minutes
  • Continued working on other tasks while the agent operated asynchronously
  • Delivered robust, reliable test coverage with zero regressions

But the speed gain was only possible because the project was properly structured and supervised.

⚠️ Why AI Fails Without Experience

Founders often ask: Can I just hire a junior dev and have AI fill in the gaps?

Unfortunately, the answer is no. Agentic AI is not a replacement for experience—it’s an amplifier of it.

Without:

  • Context-aware environments
  • Strong architectural patterns
  • The ability to review and correct AI-generated code

…you’re more likely to ship technical debt than production-ready software.

🔐 AI Is a Tool. Experience Is the Unlock.

Agentic AI is incredibly powerful—but only when used correctly.

If you’re building an AI-assisted development team, don’t just invest in tools—invest in senior-level talent that can guide and control the AI effectively.

I dive deeper into this concept in a related post:
👉 Why Experience Unlocks 10x ROI in AI-Driven Software Development

📌 Key Takeaways

  • Agentic AI (like Cursor + Claude 4) can autonomously write production-quality tests
  • True productivity comes from combining senior experience with the right context
  • Non-technical founders or junior devs are unlikely to get value without close supervision
  • With the right setup, you can compress hours of work into minutes—without sacrificing quality
  • AI won’t replace developers. But it supercharges the best ones

Below is a video of the entire 9 minute session.

Watch Cursor Agent Mode analyze the codebase and write passing unit tests using a single prompt.