The Dark Factory Revolution: When AI Writes Code Nobody Reviews

What if I told you there’s a team building security software where no human ever looks at the code? No code reviews. No pull request approvals. Just specs in, working software out. Welcome to the Dark Factory.
The Core Insight

StrongDM’s AI team has taken what most of us consider engineering malpractice and turned it into a systematic methodology. Their rules are deliberately provocative:
- Code must not be written by humans
- Code must not be reviewed by humans
- If you haven’t spent at least $1,000 on tokens per engineer per day, you have room for improvement
This isn’t recklessness—it’s a calculated bet on the November 2025 inflection point when Claude Opus 4.5 and GPT 5.2 made agentic coding reliably compound correctness rather than error.
Why This Matters

The real innovation isn’t the coding agents themselves—it’s how StrongDM solved the verification problem. When both your code AND your tests are AI-generated, how do you know anything actually works?
Their answer: Digital Twin Universe (DTU).
They built behavioral clones of Okta, Jira, Slack, Google Docs, and more. These aren’t mocks—they’re full API implementations with edge cases and observable behaviors, all generated by the same coding agents. This lets them:
- Test at volumes far exceeding production limits
- Simulate failure modes that would be dangerous against live services
- Run thousands of scenarios per hour without rate limits or API costs
The genius move? They treat scenarios like ML holdout sets—tests the coding agents can’t see, preventing them from “cheating” by asserting true.
The Uncomfortable Economics
Here’s where it gets real: $1,000/day per engineer in token costs. That’s $20,000/month overhead before you’ve written a single spec.
This creates a fascinating business model question: Can you build products profitable enough to justify this approach? And if your competitor can clone your features in hours with their own agent swarm, what’s your moat?
The answer might be: speed of iteration beats everything else when AI can implement faster than humans can review.
Key Takeaways
- The verification problem is solvable: Use holdout scenarios that agents can’t access, probabilistic satisfaction metrics instead of binary test results
- Clone your dependencies: Building DTUs of external services unlocks testing at impossible scales
- Gene Transfusion is powerful: Having agents extract patterns from existing systems and reuse them elsewhere creates compound knowledge
- The economics are brutal but maybe temporary: $200/month Claude Max vs $20k/month production—there’s a huge gap to explore
Looking Ahead
We’re watching software development bifurcate. One path keeps humans in the loop—writing, reviewing, approving. The other removes them entirely, replacing judgment with systems.
The Dark Factory isn’t just about cost efficiency. It’s about asking: what if the bottleneck was never the code itself, but the humans touching it?
StrongDM’s three-person team built working demos of their entire factory in three months. If that’s what three engineers can do with agents, what happens when every company has access to the same tools?
The answer might reshape how we think about software, teams, and what “engineering” even means.
Based on analysis of “How StrongDM’s AI team build serious software without even looking at the code” by Simon Willison