The Dark Factory Revolution: When AI Writes Code Nobody Reviews

What if I told you there’s a team building security software where no human ever looks at the code? No code reviews. No pull request approvals. Just specs in, working software out. Welcome to the Dark Factory.

The Core Insight

StrongDM’s AI team has taken what most of us consider engineering malpractice and turned it into a systematic methodology. Their rules are deliberately provocative:

Code must not be written by humans
Code must not be reviewed by humans
If you haven’t spent at least $1,000 on tokens per engineer per day, you have room for improvement

This isn’t recklessness—it’s a calculated bet on the November 2025 inflection point when Claude Opus 4.5 and GPT 5.2 made agentic coding reliably compound correctness rather than error.

Why This Matters

The real innovation isn’t the coding agents themselves—it’s how StrongDM solved the verification problem. When both your code AND your tests are AI-generated, how do you know anything actually works?

Their answer: Digital Twin Universe (DTU).

They built behavioral clones of Okta, Jira, Slack, Google Docs, and more. These aren’t mocks—they’re full API implementations with edge cases and observable behaviors, all generated by the same coding agents. This lets them:

Test at volumes far exceeding production limits
Simulate failure modes that would be dangerous against live services
Run thousands of scenarios per hour without rate limits or API costs

The genius move? They treat scenarios like ML holdout sets—tests the coding agents can’t see, preventing them from “cheating” by asserting true.

The Uncomfortable Economics

Here’s where it gets real: $1,000/day per engineer in token costs. That’s $20,000/month overhead before you’ve written a single spec.

This creates a fascinating business model question: Can you build products profitable enough to justify this approach? And if your competitor can clone your features in hours with their own agent swarm, what’s your moat?

The answer might be: speed of iteration beats everything else when AI can implement faster than humans can review.

Key Takeaways

The verification problem is solvable: Use holdout scenarios that agents can’t access, probabilistic satisfaction metrics instead of binary test results
Clone your dependencies: Building DTUs of external services unlocks testing at impossible scales
Gene Transfusion is powerful: Having agents extract patterns from existing systems and reuse them elsewhere creates compound knowledge
The economics are brutal but maybe temporary: $200/month Claude Max vs $20k/month production—there’s a huge gap to explore

Looking Ahead

We’re watching software development bifurcate. One path keeps humans in the loop—writing, reviewing, approving. The other removes them entirely, replacing judgment with systems.

The Dark Factory isn’t just about cost efficiency. It’s about asking: what if the bottleneck was never the code itself, but the humans touching it?

StrongDM’s three-person team built working demos of their entire factory in three months. If that’s what three engineers can do with agents, what happens when every company has access to the same tools?

The answer might reshape how we think about software, teams, and what “engineering” even means.

Based on analysis of “How StrongDM’s AI team build serious software without even looking at the code” by Simon Willison

The Dark Factory Revolution: When AI Writes Code Nobody Reviews

The Core Insight

Why This Matters

The Uncomfortable Economics

Key Takeaways

Looking Ahead

Related Articles

RentAHuman: The Marketplace Where Bots Hire Humans

OpenAI Quietly Disbands Its Mission Alignment Team. Here’s Why That Should Worry You.

OpenAI Disbands Mission Alignment Team, Creates “Chief Futurist” Role Instead