Structured Context Engineering: What Actually Works for AI Agents

A 9,649-experiment study challenges conventional wisdom about how to feed AI agents structured data

How should you structure the context that AI agents consume? The answer might surprise you: it depends on your model, and the conventional wisdom might be wrong.

A new arXiv paper presents the largest systematic study of context engineering for AI agents working with structured data — and the findings challenge several commonly held assumptions.

The Core Insight

The research used SQL generation as a proxy for programmatic agent operations, running 9,649 experiments across:
– 11 different models
– 4 formats (YAML, Markdown, JSON, TOON)
– Schemas ranging from 10 to 10,000 tables

The conclusions challenge what many practitioners take as gospel:

1. Architecture Choice is Model-Dependent

File-based context retrieval improves accuracy for frontier-tier models (Claude, GPT, Gemini): +2.7%, statistically significant (p=0.029)
But for open-source models, it shows mixed results: aggregate -7.7% (p<0.001)
There’s no universal best practice

2. Format Doesn’t Matter (Much)

Surprisingly, format choice (YAML, JSON, Markdown, TOON) doesn’t significantly affect aggregate accuracy (chi-squared=2.45, p=0.484).

However, individual open-source models show format-specific sensitivities — so while the average is neutral, your specific model might care a lot.

3. Model Capability Dominates Everything

The 21 percentage point accuracy gap between frontier and open-source models dwarfs any format or architecture effect.

This is the most important finding: don’t obsess over context formatting until you’ve picked your model.

4. File-Native Agents Scale

File-native agents can scale to 10,000 tables through domain-partitioned schemas while maintaining high navigation accuracy. Context structure matters for scale, just not in the way people assumed.

5. File Size is a Poor Efficiency Predictor

Compact or novel formats can actually incur token overhead driven by grep output density and pattern unfamiliarity. The relationship between file size and runtime efficiency is more complex than expected.

Why This Matters

These findings have practical implications:

Don’t optimize prematurely — Get your model choice right first
Test your specific setup — Generic advice may not apply to your combination
Frontier models are more forgiving — They’re less sensitive to context engineering choices
Open-source requires more care — Context optimization matters more for non-frontier models

Key Takeaways

Model choice > context engineering — A better model beats better formatting
The “right” approach depends on your model — One size doesn’t fit all
Scale is achievable with proper architecture — 10,000 tables is tractable
Conventional wisdom needs testing — What everyone “knows” may be wrong

Looking Ahead

This research provides evidence-based guidance for deploying AI agents on structured systems. The main takeaway: treat architectural decisions as hypotheses to validate with your specific use case, not as universal truths.

The era of “spray and pray” context engineering is ending. The era of evidence-based context optimization is just beginning.

Based on arXiv paper 2602.05447: “Structured Context Engineering for File-Native Agentic Systems”

Structured Context Engineering: What Actually Works for AI Agents

The Core Insight

1. Architecture Choice is Model-Dependent

2. Format Doesn’t Matter (Much)

3. Model Capability Dominates Everything

4. File-Native Agents Scale

5. File Size is a Poor Efficiency Predictor

Why This Matters

Key Takeaways

Looking Ahead

Topics

Related Articles

I Gave an AI Agent Full Access to My Digital Life. Here’s What Went Wrong.

The Rise of Parallel AI Agents: A New Programming Paradigm

Why AI Models Need to Think: The Science Behind Test-Time Compute