Showboat and Rodney: How to Watch Your AI Agent Actually Work

Simon Willison just released two new tools that tackle one of the most pressing problems in AI-assisted coding: how do you know what your agent actually built?

The Core Insight

When you’re working with coding agents, there’s a fundamental trust problem. The agent tells you it built something great. The tests pass. But did it actually work the way you wanted? Traditional automated tests aren’t enough—they prove correctness, not usefulness.

Enter Showboat and Rodney: a pair of CLI tools designed specifically for agents to demonstrate their work to humans in a way that’s hard to fake.

Why This Matters

Showboat is elegantly simple: it’s a tool that helps agents construct Markdown documents that literally prove their code works. Each command in the demo runs in sequence, with actual output captured directly into the document. If the agent says “here’s how you use curl and jq,” the document actually runs those commands and shows the real output.

This is the anti-cheat mechanism that AI development has been missing. The document proves itself.

Rodney complements this by handling browser automation—agents can take screenshots, click through web interfaces, and capture the actual visual result of their work. It’s built on the Rod Go library for Chrome DevTools Protocol and designed specifically to work seamlessly with Showboat.

Key Takeaways

The –help is the API – Both tools are designed so that agents can read the help text and immediately understand how to use every feature
Test-driven development has new value – Willison, formerly a TDD skeptic, now uses it to give agents clear success criteria
iPhone development is real – Both tools were primarily built via the Claude iPhone app, demonstrating how far mobile AI-assisted coding has come
Watching agents work is the new code review – The “screenshare” model of watching an agent build a Showboat document is replacing traditional review

Looking Ahead

The tools aren’t perfect—agents can still edit the Markdown directly rather than running commands. But they’re a significant step toward verifiable AI-generated code. The future of AI development might not be about reviewing commits at all, but about watching your agent demonstrate what it built in real-time.

Based on analysis of “Introducing Showboat and Rodney” by Simon Willison

Showboat and Rodney: How to Watch Your AI Agent Actually Work

The Core Insight

Why This Matters

Key Takeaways

Looking Ahead

Topics

Related Articles

OpenAI Skills API: The Missing Middle Layer for Agentic Systems

Vouch: Fighting AI Slop with Explicit Trust Networks

The Pragmatic Path to AI Coding Mastery: Lessons from Mitchell Hashimoto’s Journey