Why Domain Experts Laugh When You Say AI Will Replace Them

5 min read

HERO

Ask a trial lawyer if AI could replace her and she won’t even look up from her brief. Ask a startup founder who’s never practiced law and he’ll tell you it’s already happening. They’re both looking at the same output—and they’re both right.

The Core Insight

The Core Insight

The startup founder has a point. The brief reads like a brief. The contract looks like what a contract would look like. If you put it next to the expert’s work, most people would struggle to tell the difference. So what is the expert seeing that everyone else isn’t?

Vulnerabilities. They know exactly how an adversary will exploit the document the moment it lands on their desk.

A fascinating piece from Latent Space crystallizes the distinction: “Experts have world models. LLMs have word models.” This isn’t about intelligence—it’s about simulation depth.

Consider this scenario: You’re three weeks into a new job and need to get a busy lead designer to review your mockups. ChatGPT writes: “Hi Priya, when you have a moment, could you please take a look at my files? No rush at all—whenever fits your schedule.”

Your friend in finance reads that and says “Perfect. Polite, not pushy. Send it.”

Your coworker who’s been there three years? “Don’t send that. Priya sees ‘no rush, whenever works’ and mentally files it as not urgent. It sinks below fifteen other messages with actual deadlines. She’s not ignoring you—she’s triaging, and you just told her to deprioritize you.”

The finance friend evaluated the text in isolation. The experienced coworker ran a simulation. That’s the difference.

Why This Matters

Why This Matters

In business, geopolitics, finance, and negotiation, the environment fights back. Static analysis fails because the other side has self-interests and knowledge you don’t have. Pattern matching breaks when patterns shift in response to your actions.

The chess/poker distinction is everything. Chess has perfect information—every piece visible, every legal move known. Your best move doesn’t change based on who your opponent is. That’s why AI crushed chess: it didn’t need a theory of mind, just better calculation.

Poker is different. Information is asymmetric. You don’t know your opponent’s cards. Bluffing exists because information is private. The game becomes recursive: I think they think I’m weak, so they’ll bet, so I should trap.

When Meta’s Pluribus system beat professional poker players, it did something profound: it calculated how it would act with every possible hand, then balanced its strategy so opponents couldn’t extract information from its behavior. Human opponents couldn’t “read” it because there was nothing consistent to read.

LLMs are the opposite. They’re optimized to be readable—helpful, polite, balanced, cooperative. Great for many tasks. Terrible default in adversarial settings. A human counterparty can:
– Open aggressively, knowing the model anchors toward accommodation
– Introduce ambiguity, knowing the model resolves it charitably
– Bluff, knowing the model takes statements at face value
– Probe for patterns, knowing the model won’t adapt to being read

Key Takeaways

  • “Producing coherent output” is table stakes. The real job is produce output that achieves an objective in an environment where multiple agents are actively modeling and countering you.

  • Many domains are chess-like in their technical core but poker-like in their operational context. Professional software engineering extends beyond the chess-like core—understanding ambiguous requirements means modeling what stakeholders actually want versus what they said.

  • The parts that look like the job are chess. The parts that ARE the job are poker. Difficulty is orthogonal to “openness.” Proving theorems is hard. Negotiating salary is easy. But theorem-proving is chess-shaped and negotiation is poker-shaped.

  • Text is the residue of action. The real competence is the counterfactual recursive loop: What would I do if they do this? What does my move cause them to do next? That loop is the engine of adversarial expertise, and it’s weakly revealed by text corpora.

  • The fix is a different training loop. We need models trained on outcomes, not on whether messages “sounded reasonable.” That requires multi-agent environments where other self-interested agents react, probe, and adapt.

Looking Ahead

Google DeepMind just announced they’re expanding AI benchmarks beyond chess to poker and Werewolf—games that test “social deduction and calculated risk.” Their framing: “Chess is a game of perfect information. The real world is not.”

As LLMs get deployed as agents in procurement, sales, negotiation, policy, and competitive strategy, exploitability becomes practical. Every poker pro, every experienced negotiator, every litigator already does this instinctively. They read their counterparty. They probe for patterns. They exploit consistency.

The only question is how long before they realize LLM agents are the most consistent, most readable counterparties they’ve ever faced.

LLMs can produce outputs that look expert to outsiders because outsiders grade coherence, tone, and plausibility. Experts grade robustness in adversarial multi-agent environments with hidden state.

LLMs produce artifacts that look expert. They don’t yet produce moves that survive experts.


Based on analysis of “Experts Have World Models. LLMs Have Word Models.” from Latent Space

Share this article

Related Articles