AI Agents Write Sloppy Code: A Veteran's Field Report on Quality

Martin Fowler, one of the most respected voices in software engineering, just published a detailed account of using AI coding agents. The verdict? They work, but they introduce technical debt at an alarming rate.

The Core Insight

Fowler spent months using AI agents (Windsurf with Sonnet 3.5, then Claude Code with Sonnet 4.5) to add GitLab support to CCMenu, a Mac application for CI/CD monitoring. His conclusion is nuanced but important: AI agents genuinely speed up code production, but they leave behind quality problems that humans must catch.

The most revealing moment came when the AI “fixed” a compiler error by changing the wrong thing. Instead of making a function parameter optional (the correct one-character fix), the agent added default empty strings throughout the codebase—a change that compiles but breaks the semantic model of the code.

As Fowler writes: “This is a clear example where an AI agent left to their own would have changed the codebase for the worse, and it took a developer with experience to notice the issue and to direct the agent to the correct implementation.”

Why This Matters

There’s no shortage of AI coding benchmarks showing impressive results. But benchmarks measure whether code works, not whether it’s good. Fowler argues this matters enormously for long-term project health:

Internal quality is crucial for sustainable development
Without careful oversight, agents have a “strong tendency to introduce technical debt”
The debt makes future development harder—for both humans and agents

Fowler observed multiple quality failures:
1. Breaking idiomatic patterns: Using empty strings instead of optionals
2. Unnecessary complexity: Attempting to add caches with no justification
3. Hallucinating requirements: Implementing logic for problems that don’t exist
4. Ignoring existing code: Replicating logic instead of calling shared functions
5. Missing subtle functionality: Forgetting configuration options buried in the codebase

The Optionality Case Study

The specific example deserves attention because it illustrates how subtle the problems can be.

Swift uses optional types (String?) to signal that a value might be absent. The AI wrote functions expecting non-optional tokens, even though authentication is optional for these APIs. When code later called these functions with optional tokens, the compiler complained.

The AI’s fix: Add ?? "" at every call site to substitute empty strings for nil tokens.

The correct fix: Add ? to the function parameter to make it optional.

Both compile. Both work. But the AI’s fix:
– Isn’t idiomatic Swift
– Changes semantics (empty string ≠ absent token)
– Requires changes at multiple call sites
– Isn’t supported by the type system
– Creates maintenance burden

The correct fix is one character. The AI chose dozens of characters in multiple files. And it would have been invisible without human review.

Key Takeaways

Functionality tests aren’t enough. AI-generated code often works while being structurally wrong. Human review must assess quality, not just behavior.
Agents don’t understand “why.” They can implement features but often miss non-obvious requirements (like base URL overrides for testing) because those aren’t stated explicitly.
Language sophistication matters. Swift’s complex type system exposed issues that might hide in Python or JavaScript. Strongly typed languages make AI errors more visible.
The quality gap is closing, slowly. Claude Code with Sonnet 4.5 produced noticeably better code than earlier combinations—but still not “high quality” by Fowler’s standards.
Human oversight is non-negotiable. The AI consistently proposed solutions that required experienced judgment to evaluate and often reject.

Looking Ahead

Fowler’s conclusion is worth quoting: “If working on large software systems has taught me one thing it’s that investing in the internal quality of the software, the quality of the codebase, is a worthwhile investment. Don’t get overwhelmed by technical debt.”

The current crop of AI agents generates code faster than humans. That’s valuable. But speed without quality is a trap. Organizations deploying agents need to account for the oversight costs, not just celebrate the velocity improvements.

For now, AI coding agents remain power tools that require skilled operators. Use them carelessly and you’ll build a codebase that’s increasingly difficult to maintain—for humans and AI alike.

Based on: “Assessing internal quality while coding with an agent” (Martin Fowler)

AI Agents Write Sloppy Code: A Veteran’s Field Report on Quality

The Core Insight

Why This Matters

The Optionality Case Study

Key Takeaways

Looking Ahead

Topics

Related Articles

The Rise of Parallel AI Agents: A New Programming Paradigm

CodeRLM: Teaching AI to Explore Codebases Like a Senior Developer

Context Engineering for Coding Agents: The Complete 2026 Guide