- Published on
The Evolution of AI Agentic Patterns
- Authors

- Name
- Mai Khoi TIEU
- @tieukhoimai

Table of Contents
When ChatGPT went mainstream in late 2022, most of us interacted with large language models the same way: by typing questions and marveling at the answers. That interaction was the beginning of a discipline — and over the following four years, that discipline grew substantially in scope, rigor, and infrastructure. What started as an art of clever phrasing has evolved into full system architecture. This post traces that journey across three generations of thinking: Prompt Engineering, Context Engineering, and Harness Engineering.
Prompt engineering — "what should I say?"
From 2022 through most of 2024, the central challenge was figuring out how to talk to a model. LLMs are trained on vast corpora of human knowledge — books, code, papers, conversations — and that knowledge gets compressed into billions of parameters. If the knowledge is already inside the model, then the only variable left is how you phrase your request. Write the right words, in the right structure, and you unlock the right answer.
So we started doing the one thing we could do: we wrote to it. We typed prompts. The craft lived entirely in the prompt text — the exact words, structure, and instructions you handed to the model. Get it right, and the model performed beautifully. Get it wrong, and you got something generic or hallucinated.
And for a while, it worked remarkably well. Techniques like Chain-of-Thought ("let's think step by step"), few-shot examples, and role prompting produced genuinely impressive results. Prompt engineering became a discipline, then a job title, then almost a mythology.
"The hottest new programming language is English." — Andrej Karpathy, 2023.
But LLMs are frozen in time. Their knowledge ends at a training cutoff. They don't know what your company shipped last week, what your company's internal policy says, or what error your system threw five minutes ago. They hallucinate when asked about things outside their training. And no matter how cleverly you phrase a question, the model cannot answer accurately from knowledge it simply doesn't have. So we realized we needed to attach a knowledge space to it — we needed to feed it context.
Context engineering — "what info should I feed?"
Context engineering emerged roughly toward the end of 2023, driven by one uncomfortable realization: writing a better prompt wasn't enough if the model was operating blind. You needed to think carefully about what information you were loading into the context window — the full composition of everything the model sees before it generates a response.
And where does that context come from? Think about how agentic systems work — the model operates in a loop, and at each turn, more information gets assembled into the prompt:
- Conversation history — Previous turns in the dialogue
- Knowledge retrieval — Relevant documents pulled from external sources via RAG
- Tool results — Outputs from MCP tools the agent has called, including failures it can learn from
What changes at this layer is the nature of the prompt itself. It's no longer just a static instruction you write once — it becomes a context window composition: a dynamic assembly that gets rebuilt on every single turn. The prompt grows, shifts, and accumulates — and grows harder to control.
The problem is that context windows have a hard limit. And this is the challenge the whole field is wrestling with right now: how do you manage what goes in?
Techniques like compaction and summarization help — but they introduce their own problems. When a model summarizes its own conversation history, it becomes biased toward whatever it chose to remember. Important details get quietly dropped. This opens up a whole set of sub-problems: code-mode compaction, dynamic tool selection to keep your MCP context lean, RAG as a way to pull in only what's relevant on demand. All of it is essentially the same question: how do you keep the context window useful without letting it explode?
Harness engineering — "what system should I build?"
Then, toward late 2025 going into 2026, a blog post started circulating that put a name to where things were heading. Mitchell Hashimoto — co-founder of HashiCorp — wrote My AI Adoption Journey, and in it he introduced the term harness engineering.
Every time the agent makes a mistake, change the system so that mistake structurally cannot recur.
Not the prompt. Not the context. The system.
This gave a name to what forward-thinking teams had been quietly building: Harness Engineering — the design of the full infrastructure surrounding the model. Tools, memory, sandboxes, orchestration, error recovery, evaluation loops, security guardrails.
After that, you started hearing it everywhere — teams building serious AI products, LangChain shipping their Open Deep Research project on LangGraph, companies rethinking their entire infrastructure.
The rigor has quietly shifted from prompt text toward system architecture. The question is no longer "how do I write a good prompt?" — it's "what do I actually want to build?"
Each level wraps the previous one

This is the part worth sitting with: none of these generations made the previous one obsolete. A weak prompt inside a well-engineered harness still produces weak output. A perfectly crafted prompt with a bloated, poorly managed context window still hallucinates. And a beautifully composed context window with no surrounding system falls apart the moment something unexpected happens. The nesting is the point. Prompt text is the foundation. Context window composition is the layer that connects the model to the world. The harness is the system that makes it all reliable enough to ship.