Why Trial Lawyers Laugh at AI: The Adversarial Reasoning Gap LLMs Can’t Close

Ask a trial lawyer if AI could replace her and she won’t even look up from her brief. Ask a startup founder who’s never practiced law and he’ll tell you it’s already happening. They’re both looking at the same output. So what is the expert seeing that everyone else isn’t?
Vulnerabilities. Experts know exactly how an adversary will exploit the document the moment it lands on their desk.
The Core Insight

The distinction between LLMs and human experts isn’t intelligence—it’s simulation depth. LLMs generate outputs that look correct in isolation. Experts evaluate outputs as moves that will land in environments full of agents with their own models and incentives.
Consider this workplace scenario: You need a busy designer to review your mockups. An LLM drafts a polite message saying “No rush, whenever works.” Your finance friend thinks it’s perfect—polite, respectful. Your experienced coworker immediately spots the problem: “Priya sees ‘no rush’ and mentally files it as not urgent. It sinks below fifteen other messages with actual deadlines.”
The finance friend and the LLM made the same mistake: they evaluated the text without modeling the world it would land in. The expert ran a simulation—Priya’s workload, her triage heuristics, what ambiguity costs, how “no rush” gets interpreted under pressure.
Experts have world models. LLMs have word models.
This gap is particularly stark in adversarial settings. In chess, your best move doesn’t change based on who your opponent is—board state is board state. But add hidden information (poker) and suddenly the game becomes recursive: “I think they think I’m weak, so they’ll bet, so I should trap.” Bluffing exists because information is private. Reading bluffs requires modeling their model of you.
Why This Matters

The training mismatch is fundamental. LLMs are optimized for completions that human raters approve of in isolation. RLHF pushes toward helpful, polite, balanced outputs—qualities that score well in one-shot evaluations. But these same qualities systematically under-weight second-order effects: how counterparties interpret signals, what your message reveals about leverage, and how they’ll adapt after reading it.
Domain experts get trained by the environment itself: if your argument is predictable, it gets countered. If your concession leaks weakness, it gets exploited. If your email invites delay, it gets delayed. LLMs learn from descriptions of these dynamics (text), not from repeatedly taking actions where other agents adapt and punish predictability.
Meta’s Pluribus poker bot illustrates what adversarial robustness actually requires: it calculated how it would act with every possible hand, then balanced its strategy so opponents couldn’t extract information from its behavior. Pluribus was specifically designed to be unreadable. Opponents couldn’t model it because there was nothing consistent to model.
LLMs are the opposite. They’re predictable. Their cooperative bias is detectable. Their prompting strategies are consistent. A human negotiator can probe an LLM, identify patterns, and exploit its predictability. The LLM can’t recalibrate because it doesn’t know there’s anything to recalibrate to.
Humans can model the LLM. The LLM can’t model being modeled.
Key Takeaways
- Intelligence ≠ adversarial competence: Raw reasoning power doesn’t fix missing simulation depth
- Static pattern-matching fails in adaptive environments: Markets, negotiations, and legal battles involve agents who update based on your moves
- The RLHF trap: Training for helpful/polite outputs optimizes for one-shot approval, not adversarial robustness
- Experts see second-order effects: How will this be interpreted? What does it signal? How will they adapt?
- Prompting doesn’t solve the ontology problem: Models can’t distinguish “cooperative task” from “task that looks cooperative but will be evaluated adversarially”
- The causal knowledge gap: Adversarial expertise comes from outcomes that were never written down
Looking Ahead
The essay identifies four capabilities required for adversarial robustness: (1) detecting that a situation is strategic, (2) identifying relevant agents and their objectives, (3) simulating how agents interpret signals and adapt, and (4) choosing actions that remain good across plausible reactions.
Steps 2-4 can be addressed with sophisticated prompting. Step 1—recognizing when you’re in an adversarial situation—is the fundamental challenge. The model has no default ontology for distinguishing genuinely cooperative contexts from contexts that merely appear cooperative.
For practitioners building AI systems, this suggests a clear boundary: LLMs excel in domains where outputs are evaluated on intrinsic quality. They struggle where outputs are evaluated by how adversaries will exploit them. Understanding this distinction isn’t about limiting AI—it’s about deploying it where it can actually succeed.
The trial lawyer isn’t being dismissive. She’s seeing something the founder genuinely can’t: in adversarial environments, looking correct is table stakes. The real job is being unexploitable.
Based on analysis of “Experts Have World Models. LLMs Have Word Models.” from Latent Space