
Testing Software With Agentic AI
Agentic AI is revolutionizing how software is built—but without the right experience and context, it often produces brittle, misguided code.
In this post, I’ll walk through a real-world example using Cursor IDE, Claude 4 Sonnet, and MCP servers to automate test coverage for a Firebase Functions project. More importantly, I’ll explain why this success wasn’t just about AI—it was about experience.
🚀 The Experiment: Agent Mode Writes 17 Tests from One Prompt
Here’s the only instruction I gave to the AI in Cursor’s agent mode:
add test coverage for @userMigrations.ts @testing-rules.mdc @project-structure.mdc
Within 9 minutes, the AI:
- Wrote 17 Jest tests covering header validation, API key checking, user migrations, and error handling
- Mocked Firestore query chains properly (including
.where().where().get()
) - Followed project-specific testing patterns
- Produced 100% passing tests with no regressions
On the surface, that sounds like magic.
But here’s the truth: the AI succeeded because of the environment it was placed in—and because it was guided by someone who knew what good software looks like.
🎯 Why Experience Matters in AI-Driven Development
AI agents don’t know how to write maintainable code by default. They need direction—and only senior engineers know how to provide it.
1. Environment and Context Make or Break AI Success
Agent mode worked because the project had:
- Well-documented code structure
- Model Context Protocol (MCP) servers providing memory and step-by-step reasoning
.mdc
files describing testing rules and project organization- Clear architectural boundaries and existing mocking conventions
Without that foundation, the AI would fail. Poor context = poor results.
2. Experience Is Required to Course-Correct
Even in this success story, the AI hit several roadblocks:
- Type errors and Firestore mock issues
- Misunderstandings around try/catch blocks and async error flow
- Unexpected behavior with malformed inputs
As a senior developer, I knew how to debug these problems, adjust expectations, and refine the test logic accordingly. A junior dev—or worse, a non-technical founder—wouldn’t have caught these nuances.
⏱️ Real Productivity Gains: 4 Hours of Work in 9 Minutes
This wasn’t just a coding experiment. It delivered real productivity:
- Compressed 2–4 hours of manual test writing into 9 minutes
- Continued working on other tasks while the agent operated asynchronously
- Delivered robust, reliable test coverage with zero regressions
But the speed gain was only possible because the project was properly structured and supervised.
⚠️ Why AI Fails Without Experience
Founders often ask: Can I just hire a junior dev and have AI fill in the gaps?
Unfortunately, the answer is no. Agentic AI is not a replacement for experience—it’s an amplifier of it.
Without:
- Context-aware environments
- Strong architectural patterns
- The ability to review and correct AI-generated code
…you’re more likely to ship technical debt than production-ready software.
🔐 AI Is a Tool. Experience Is the Unlock.
Agentic AI is incredibly powerful—but only when used correctly.
If you’re building an AI-assisted development team, don’t just invest in tools—invest in senior-level talent that can guide and control the AI effectively.
I dive deeper into this concept in a related post:
👉 Why Experience Unlocks 10x ROI in AI-Driven Software Development
📌 Key Takeaways
- Agentic AI (like Cursor + Claude 4) can autonomously write production-quality tests
- True productivity comes from combining senior experience with the right context
- Non-technical founders or junior devs are unlikely to get value without close supervision
- With the right setup, you can compress hours of work into minutes—without sacrificing quality
- AI won’t replace developers. But it supercharges the best ones
Below is a video of the entire 9 minute session.