The Anatomy of an AI Agent: Planning, Memory, and Tools

What separates a chatbot from an autonomous agent? It’s not magic—it’s architecture. Understanding these building blocks is essential for anyone building or working with AI systems today.
The Core Insight

Lilian Weng’s foundational paper on LLM-powered autonomous agents provides the most elegant framework for understanding how AI agents actually work. The model is just the brain; the magic happens in three complementary systems: planning, memory, and tool use.
This isn’t theoretical handwaving. The architectures Weng describes—ReAct, Reflexion, Chain of Hindsight—are the direct ancestors of today’s coding agents like Claude Code and Codex. When your agent pauses to “think,” decomposes a problem into subtasks, or learns from a failed attempt, it’s implementing patterns documented here.
Why This Matters

Planning is where agents transform from clever autocomplete into actual problem-solvers. The key techniques:
- Task Decomposition: Breaking “build me an app” into “create file structure → implement authentication → add routes → write tests.” Chain of Thought prompting makes this explicit.
- Tree of Thoughts: Instead of following one reasoning path, explore multiple possibilities. Like chess engines, evaluate promising branches before committing.
- Self-Reflection: The Reflexion framework shows how agents can learn from failure within a session—storing “that approach didn’t work because X” and adjusting strategy.
Memory maps directly to human cognition:
– Sensory memory → embedding representations
– Short-term memory → context window (finite, precious)
– Long-term memory → vector databases with fast retrieval (FAISS, ScaNN, HNSW)
The vector store approach has become standard: embed information, store it externally, retrieve relevant chunks via similarity search. It’s how agents can “remember” entire codebases without stuffing them into context.
Tool Use is the force multiplier. An agent that can only generate text is limited to what’s in its weights. An agent that can call APIs, execute code, search the web, and interact with external systems has effectively unlimited capabilities.
Key Takeaways
- ReAct (Reasoning + Acting) is the pattern behind most modern agents: interleave Thought/Action/Observation loops until the task is complete.
- Context windows are the new bottleneck. Techniques like MIPS (Maximum Inner Product Search) let agents access vast knowledge bases without blowing their context budget.
- HuggingGPT showed the path: use LLM as orchestrator, dispatch to specialist models. This is exactly what multi-agent systems do today.
- Self-reflection dramatically improves results. Agents that can critique their own work and adjust strategy outperform those that can’t—even with the same base model.
Looking Ahead
The frameworks in Weng’s paper were theoretical when published. Today they’re production systems. The API-Bank benchmark (53 tools, 568 API calls) seems quaint compared to what modern agents routinely handle.
But the architecture hasn’t fundamentally changed. If you understand planning, memory, and tool use, you understand how every AI agent works—from AutoGPT to the coding assistant in your IDE. The question isn’t whether these systems will become more capable, but how quickly.
The next frontier is what happens when agents get better at all three simultaneously. Planning that spans weeks, memory that rivals human knowledge workers, and tool use that encompasses any API or interface. We’re not there yet—but we can see it from here.
Based on analysis of “LLM Powered Autonomous Agents” by Lilian Weng