The Dark Factory: When Humans Stop Reading the Code

StrongDM’s AI team has achieved something that sounds like engineering heresy: they’re building security software where no human ever looks at the code. And they’re spending $1,000 per engineer per day on tokens to do it.

The Core Insight

The “Software Factory” concept represents what Dan Shapiro called the fifth level of AI adoption—the “Dark Factory” where agents write, test, and ship code without human review. StrongDM’s AI team, founded in July 2025, codified this approach with a mantra that would make most engineering managers uncomfortable:

“Code must not be written by humans. Code must not be reviewed by humans.”

How can this possibly work when LLMs are notorious for making mistakes? The answer lies in a radical reimagination of quality assurance.

Why This Matters

The November 2025 Inflection Point Was Real

Many developers noted when Claude Opus 4.5 and GPT 5.2 turned a corner on reliability. But StrongDM’s team traced their catalyst to an earlier moment: Claude 3.5’s second revision in October 2024, when “long-horizon agentic coding workflows began to compound correctness rather than error.”

Scenarios as Holdout Sets

Traditional testing fails in an agent-written world—agents can just cheat and assert true. StrongDM’s solution borrows from ML: they treat scenario tests as holdout sets, kept outside the codebase where coding agents can’t see them. Instead of boolean pass/fail, they measure “satisfaction”—the fraction of observed trajectories through scenarios that likely satisfy the user.

The Digital Twin Universe

This is where things get wild. To test their permission management software at scale, they built behavioral clones of Okta, Jira, Slack, Google Docs, and more—complete API replications including edge cases. The trick? Feed complete API documentation into agents and have them build self-contained Go binary imitations.

“Creating a high fidelity clone of a significant SaaS application was always possible, but never economically feasible.”

With local clones free from rate limits and usage quotas, their simulated testers can run thousands of scenarios per hour.

Key Takeaways

Digital Twin Universe: Clone third-party APIs with agents to enable unlimited testing
Scenarios as holdouts: Keep test specs where agents can’t see them—like ML holdout sets
Probabilistic satisfaction over boolean tests: Measure what fraction of trajectories satisfy users
Gene Transfusion: Extract patterns from existing systems and reuse elsewhere via agents
Semports: Direct code porting between languages using agents
Pyramid Summaries: Multi-level documentation letting agents enumerate quickly and zoom in

The Cost Question

Let’s not gloss over this: $1,000 per day per engineer. That’s $20,000/month added to your engineering budget. At that price point, this becomes as much a business model exercise as a technical one. Can you build products profitable enough to justify the overhead?

Simon Willison, who visited the team and wrote about them, noted he personally finds $200/month on Claude Max gives plenty of space to experiment—but he’s not running 24/7 QA swarms either.

The real value may be in selective adoption: the Digital Twin pattern for integration testing, scenarios as holdouts for quality assurance, without necessarily going full “dark factory.”

Looking Ahead

StrongDM released their work unconventionally. Their coding agent Attractor’s repo contains no code—just three markdown files specifying the software in meticulous detail, with a note to feed those specs into your own agent. Their AI Context Store (16K lines of Rust, 9.5K of Go, 6.7K of TypeScript) stores conversation histories in an immutable DAG.

The three-person team built all of this—agent harness, Digital Twin clones of six services, and QA swarms—in three months. Before the November model improvements that made agentic coding more reliable.

This is a glimpse of one potential future: engineers move from building code to building and semi-monitoring the systems that build code. Whether it’s the future depends on economics as much as capability.

Based on analysis of “Software Factories and the Agentic Moment” by StrongDM AI, via Simon Willison

Tags: #SoftwareFactory #AIAgents #QualityAssurance #DigitalTwins #AgenticCoding

The Dark Factory: When Humans Stop Reading the Code

The Core Insight

Why This Matters

Key Takeaways

The Cost Question

Looking Ahead

Related Articles

100k Stars for a Reason: The Complete Guide to Building Production-Grade AI Agents with OpenClaw

The Death of the IDE and Other Hard-Won Lessons from a Year of Agent Programming

When AI Agents Attack: The Scott Shambaugh Incident and Our Accountability Problem