Your AI Output Is Inconsistent Because You're Creating Implicit State

You're using a flagship model. You followed the tutorial. You wrote a decent prompt. So why does the output feel... off? And why can't you reproduce the good results you got last time?

The obvious answer is the model. Maybe you need GPT-5.4 instead of 4o. Maybe Claude Opus instead of Sonnet. So you upgrade. Sometimes it helps. But the inconsistency follows you. Better models, same frustrating pattern. That's a clue something deeper is going on.

Here's what's actually happening. Every message, every half-finished task, every abandoned idea you leave in a session — it all stays in the context window. The model sees all of it. And it uses all of it, whether you intended that or not. You're not giving it a prompt. You're giving it a polluted environment. Programmers call this implicit state — context that influences behavior without being explicitly declared or intended. And you're creating it without realizing it.

Polluted context forces the model to extract signal from noise — burning capacity on irrelevance. Precise context lets it focus entirely on the task.

The Day Everything Broke

Here's what happened to me recently. I was building a quantitative trading system. I did everything right upfront — discussed architecture with Claude, GPT, and Gemini, synthesized their input into a clean coding spec, then handed it to Codex to execute. It was working.

Then GPT flagged something important: our assumptions needed validation before full implementation. Smart advice. So I collaborated with Claude and GPT to produce a revised spec — tighter, more cautious, phased. Good thinking all around.

Then I made one fatal mistake. Instead of stopping Codex and starting a fresh session, I interrupted it mid-execution and said: "We changed the idea. Implement coding spec v2."

Codex didn't start over. It couldn't — the old spec was still alive in its context. So it did what any model would do with two conflicting realities in the same window: it tried to reconcile them. Compatibility layers appeared everywhere. Old code that should have been deleted got quietly reused. Features from spec v1 crept back in. The codebase became unreadable.

Six hours of cleanup. Nearly 2 billion tokens consumed. Not because Codex is a bad model. Because I handed it a polluted environment and expected clean output.

The Real Variable Isn't Intelligence

After the cleanup, I stopped asking which model to use. I started asking something different: what makes flagship models degrade, and what makes lightweight models outperform? The answer was hiding in plain sight. It was never about the model. It was always about what I handed the model — how concise, how precise, how unambiguous the context was.

The pattern became clear through observation. With a clean, explicit plan and well-decomposed tasks, GPT-mini or Claude Sonnet consistently outperformed flagship models on the same problem. And flagship models, given polluted context, consistently underperformed their lightweight alternatives.

The variable wasn't intelligence. It was context quality.

This is the principle I now call explicit by default: everything the model needs should be a named, visible, scoped artifact. But there's a sharp edge to this principle that most people miss — seemingly related context is often more dangerous than obviously unrelated context. Unrelated information gets ignored. Misleading-but-plausible information gets used. The model can't distinguish between context you intended and context you forgot was there. It treats everything as signal.

The goal isn't more context. It isn't even less context. It's precise context — nothing missing, nothing extra, nothing that could be mistaken for something it isn't.

Three Rules I Now Follow Without Exception

That realization crystallized into three rules I now follow without exception:

Rule 1

One task, one session.

Don't let unfinished context from Task A bleed into Task B. Even if they feel related. Especially if they feel related — that's when misleading context is most dangerous.

Rule 2

Plan first, always.

Never ask a model to think and execute simultaneously in the same prompt. Write the plan first, review it, then execute against it in a clean context. Your spec exists before your Codex session opens — never the other way around.

Rule 3

Treat context like a contract.

Everything in the context window is a promise to the model: "this is relevant." Only include what you'd genuinely stand behind. If you wouldn't put it in a formal spec, don't let it sit in your session.

The Bigger Picture

Here's what I've come to believe after all of this: a model performs well when its environment is clean and its goal is unambiguous. Every bit of context noise, every vague or conflicting instruction, produces unexpected results — wasted time, wasted tokens, and a codebase you don't recognize.

Working with an AI model is less like talking to a colleague and more like formulating an optimization problem. You need well-defined state, clear constraints, and an unambiguous objective. The model is the optimizer — extraordinarily powerful, but only as good as the problem you hand it.

You can't fix a poorly defined problem by upgrading the optimizer.

And that's why the most important skill in the AI-native coding era isn't prompt engineering, model selection, or knowing the latest tools. It's something much older and more fundamental: the discipline to make everything explicit before you ask anything of anyone.

There's a deeper layer to this — about how models allocate intelligence across different types of problems, and why task-model matching matters more than raw capability. But that's a story for next time.