general 2026-06-28 · Updated 2026-06-28

This Is How Agentic AI Kills Your Code

hero

I keep seeing the same claim from AI enthusiasts on LinkedIn:

AI makes senior engineers more important. Coding is dead, or at least no longer the scarce skill. Taste, judgment, and vision remain. The code itself is becoming cheap. The real question, usually delivered with the subtlety of a sales funnel wearing a Patagonia vest, is: do you have taste?

I understand why this sounds attractive. Executives get to believe engineering output is about to become cheaper. Senior engineers get to believe their judgment will become more valuable. AI vendors get to sell speed as destiny.

But the claim is wrong in a dangerous way.

AI does not simply make strong engineers stronger. Used too much, it can make even senior engineers kill products faster. Not because they suddenly forget how to code. Not because they lose taste. Because AI gives you speed before it gives you understanding.

That is the bargain.

And bargains collect.

The Greenfield Illusion

The elation around AI coding usually comes from greenfield work.

You describe a product. The model produces files. Routes appear. Components assemble themselves. Database migrations land. Tests pass, or at least enough of them do. It feels like cheating gravity.

In a young product, this can work shockingly well because there is not much local truth yet.

There is no deep institutional memory. There are no scar tissue abstractions. There are no invariants hidden in old decisions. There is no customer behavior encoded in a strange branch from three years ago. There is no revenue-critical edge case named legacyMode that everyone hates but nobody can remove.

In a young product, the line of code you write is often the epistemic truth.

If the model creates a new shape, that shape becomes reality. The code says what the product is. There is little contradiction between invention and truth because the truth has not had time to harden.

That is why AI feels magical in prototypes. It is operating in a low-friction world.

But this is only true for some greenfield work.

Not all greenfield projects are simple just because they are new.

CRUD, UX, and the Easy Validation Trap

AI is strongest when the task has common patterns and cheap validation.

A CRUD app has familiar shapes: login, dashboard, form, table, settings page, admin panel. These are not trivial, but they are heavily patterned. The expected behavior is visible. A human can click through the result and quickly see whether it mostly works.

UX work has a similar advantage. The feedback loop is perceptual. You can see whether the layout is broken. You can feel whether the interaction is wrong.

That does not mean AI writes perfect CRUD or perfect UX. It means the cost of discovering failure is relatively low.

The user can see the error. The developer can reproduce it. The bug is often local. The expected behavior is usually borrowed from common product patterns. AI is good at common product patterns.

But that is not the same thing as being good at engineering.

It is being good at tasks where correctness is visible, local, and familiar.

Novel Stateful Systems Are Different

A novel stateful system can be greenfield and still be extremely hostile to AI.

In that kind of system, the hard part is not drawing the screen or wiring a route. The hard part is preserving a causal choreography across interlocked states. The system must know what happened, what is allowed to happen next, which state owns which truth, which state is derived, which callbacks are volatile, which metadata is durable, and which transitions are illegal even if they are locally convenient.

That kind of system does not die from one missing edge case. It dies when epistemic truth has to compete with half-truths that crowd both the human and machine context windows.

This is what happened with Band-Aider, my coding harness.

Band-Aider is not a CRUD app. It has almost no UX. It is a harness for coordinating AI-assisted coding through a precise epistemic flow. The difficult part is not that any one class is large. The difficult part is that the states are interlocked.

In Band-Aider, the hard part was not generating code. It was preserving the order of truth across source changes, test validation, dependency metadata, retries, and checkpoint recovery. If one layer guessed or repaired state at the wrong time, the system could look productive while quietly corrupting its own map.

These are not independent tasks. They are causal relationships.

AI got Band-Aider close. So close, in fact, that the temptation to run the light became enormous. I asked AI to cross the final mile.

That is the dangerous part.

It produced code that looked like progress. It implemented many of the named pieces. It could follow local requests. It could add a method, patch a branch, create a fallback, update a test, and explain itself plausibly.

Then it hit a wall.

The curve bent right before MVP.

I had the map, but I could no longer see it inside the labyrinth AI had generated.

The remaining failures were not obvious syntax errors or missing functions. They were epistemic errors. The model would add state to repair state. It would add a branch to patch a transition. It would hedge around obsolete paths when those paths should have been removed. It would confuse durable state with volatile runtime wiring. It would make the immediate error disappear while expanding the number of possible half-valid states.

And in my rush and excitement, I ignored the comprehension debt a few turns too many.

Then it compounded.

Let me repeat that for emphasis: it compounded.

It is sneaky like that. Like a subprime loan, but with type hints.

This is why novel stateful systems are different from CRUD or UX.

In CRUD, the question is often: did the system save and show the thing?

In UX, the question is often: can the user understand and complete the action?

In a stateful harness, the question is: did the system preserve the intended causal structure across time, retries, partial failure, metadata refresh, and future expansion?

That question is much harder to validate.

There may be no screen to inspect. There may be no single test that proves the invariant. The bug may only appear after a particular sequence. A planner guesses too broadly. A generated patch repairs the wrong state. Metadata goes stale. A retry path reuses the wrong truth. A recovery mechanism restores the shape but not the meaning.

Then you learn the resume functionality was broken while the AI was fixing another unrelated bug. Then you learn it had been saving to one file and reading from another. Then you learn it had not separated serializable references from unmanaged I/O handles.

Each lesson was painful. Hours were lost. And much of that pain was unnecessary.

If I had hand-coded the thing, I would still have made mistakes. But my mistakes tend to be less epistemic. They are more likely to be null checks, off-by-one errors, flipped inequalities, or wrong assumptions inside a small function. Still mistakes, obviously. I am not pretending to be a compiler with a mortgage.

But those mistakes are local. You put a breakpoint down and you usually see what happened.

Coordination mistakes are different. They are harder to reproduce. You have to align state, timing, history, metadata, and intent in the right combination before the failure reveals itself. Combinatorial problems are hard. Apparently this remains true even if the autocomplete has a brand strategy.

That is why senior hands and senior eyes do not automatically save you.

The issue is not age. The issue is epistemic structure.

Mature Products Have Epistemology

A mature product is not just a pile of code. It is a knowledge system.

Every module contains decisions. Every abstraction encodes a bet. Every ugly workaround may be attached to some forgotten production incident. Every boundary exists because something once went wrong when the boundary did not exist.

In a mature product, the code is not merely instructions to the machine. It is also a record of what the team believes to be true.

Every branch is not merely logic. It is shared understanding between the product team, the stakeholders, the business, the PM, the customer, and the machine. It has the force of a social contract.

AI will happily duplicate that contract, move the locus of control, shift the center of gravity, and create shadow logic. It is not always “wrong” in the local sense. Founding a new cult is not wrong in the local sense either. Still maybe do not let it write billing code.

Young products can tolerate improvisation because the truth is still being discovered. Mature products require conservation because the truth is already agreed on, and that understanding was bought.

But a novel stateful greenfield system can have dense epistemology from day one.

That is the bridge between Band-Aider and mature products. Both are hard for AI for the same reason: the code is not just code. It is a map of what the system believes to be true.

AI is very good at producing locally plausible code. It struggles to know which local truths are sacred contracts between humans and machines.

The model can recognize that a piece of code “looks like” it should be refactored. It may even be right in the abstract, while still being epistemically toxic.

Here is a simple example: my local AI checks for null excessively. If it were alive, I would swear it had OCD.

But excessive null checking can be epistemically brittle. It is like checking whether customer_id is null after you have already precharged the credit card.

The human question is: how did we get this far without a customer ID?

I would rather throw hard than check for null and let the transaction succeed silently. I would rather get the 500, let it bubble, and fix the source. AI often lacks that conviction. It prefers the local courtesy of “handling” the case over the systemic discipline of making the case impossible.

How AI Kills Mature Products

AI rarely kills mature products by producing one obviously terrible bug.

It kills them by increasing the number of plausible paths through the system.

There is a reason database people talk about atomicity. It is better to have one way of being right than partial credit for correctness. AI loves partial credit because it lacks epistemic rigor.

Going back to the transaction example, this is how inventory gets deducted while customer_id is still missing. This is how accounting entries get duplicated because the model is not counting states and methods as liabilities. In my case, state was being written in one place while recall was looking somewhere else.

A fallback here. A branch there. A compatibility layer. A duplicate state field. A second source of truth “just to be safe.”

All of these can be harmful.

AI hedges. It acts as if everything is a normal distribution with soft curves, even when the system needs a hard commit or a hard fail. Think of a red light with fifty lamps. Humans already gun the intersection when they see yellow. AI will happily add a few more yellows and call it resilience.

None of these changes looks catastrophic in isolation. Many will pass tests. Some will even look responsible. The code becomes more defensive, more flexible, more accommodating.

And less knowable.

This is the real failure mode: state-space expansion.

The product does not die because one function is wrong. It dies because the system now has too many possible ways to be right, half-right, stale, repaired, resumed, overridden, or silently inconsistent.

A mature product depends on a bounded number of truths. AI tends to add more surfaces where truth might live.

It does this because local patching is easier than preserving global invariants. The model sees a symptom and adds a path around it. It sees a missing case and adds a condition. It sees a stale object and adds a rehydration hook. It sees ambiguity and adds state.

Take that to its faithful conclusion.

State is combinatorial.

There is no machine powerful enough to contain it once you allow it to explode.

Senior Judgment Does Not Save You Automatically

The LinkedIn claim says AI makes senior developers more important because coding is cheap and judgment matters.

I think that claim is cheap.

Judgment does not automatically win inside an incentive structure that rewards output more than restraint.

Most engineering organizations already measure the wrong things.

Lines of code are visible. Pull requests are visible. Tickets closed are visible. Products shipped are visible. Heroics are visible. The late-night rescue, the giant migration, the “we shipped it in three weeks” story, the demo that materialized from nothing. These things travel well in status meetings.

What is usually absent?

Cyclomatic complexity. Cognitive complexity. Coupling. Cohesion. Instability. Afferent and efferent dependency pressure. Maintainability index. CRAP score. Change failure probability. The number of possible invalid states. The size of the transactional boundary. The number of places where truth can live. The amount of code that no longer needs to exist.

Nobody throws an office party because a system now has fewer ways to be wrong.

Nobody sends a company-wide email celebrating an atomic boundary.

Nobody gets a trophy for deleting a fallback path before it gets exercised long after it should have been removed.

At best, companies celebrate reliability after pain. They track MTTD and MTTR once production is already bleeding. They celebrate the firefighters because fire is visible. They rarely celebrate the person who quietly removed the dry grass.

This matters because AI amplifies whatever the organization already rewards.

If the dashboard rewards throughput, AI will increase throughput. If performance reviews reward shipping, AI will help people ship. If leadership celebrates demos, AI will produce demos. If the culture worships heroic saves, AI will create more situations that later require heroes.

The senior engineer may know better. They may see the complexity forming. They may understand that a generated patch is locally plausible but architecturally poisonous.

But they are still operating inside a system that rewards motion, has been busy laying people off, and then makes the remaining engineers the backstop for everything.

They have to choose between clicking approve and going home, or rejecting the patch and explaining why to their peers and their boss. Humans are complex, but their behavior is predictable in that situation. The smart ones optimize for the promotion and the exit ticket before the next eager intern shows up for their learning moment.

The pathological case is easy to exercise. The correct case is under-rewarded.

The code review comment that says “this adds a second source of truth” competes against the product demo that works. The architectural objection that says “this increases the number of invalid states” competes against the ticket moving to Done. The warning that says “we need to preserve the causal boundary here” competes against the fact that the model produced a passing test and everyone is tired.

Judgment has to fight incentives.

And incentives usually win.

This becomes more dangerous while models are improving quickly and token costs are subsidized. If the model gets better every few months, it is tempting to assume the complexity problem will solve itself. Bigger context. Bigger model. Better retrieval. Better agents. Better scaffolding. More automation around the automation.

So the organization does not feel the cost immediately.

The model is cheap enough today. The tokens are subsidized. The context window is growing. The demo works. The ticket closes. The dashboard improves. Everyone feels faster.

Meanwhile the codebase accumulates more paths, more branches, more duplicated truths, more ambiguous ownership, more fallback behavior, and more states that require explanation.

Between the reward today and the bill that may never come, the path to winning is obvious.

That is why senior judgment does not save you automatically. Not because senior engineers are useless. Because even good judgment loses when the system rewards the opposite behavior.

The Power Curve

This is why teams reach for bigger models.

As the product grows, the AI needs more context to behave. More files. More tests. More logs. More architecture notes. More examples. More instructions. More warnings about what not to do.

The context grows. The prompts grow. The models grow. The cost grows.

You are fighting the power curve.

It is like pushing a boat past hull speed. At low speed, every bit of power gives you visible progress. Then the boat starts making a larger and larger hole in the water. More energy goes in. Less of it becomes forward motion. Eventually you are not moving efficiently. You are just paying to maintain turbulence.

Mature AI-assisted development can fall into the same pattern.

At first, the model saves time. Then the codebase becomes more complex. Then you need a bigger model to understand the complexity. Then the bigger model adds more complexity. Then you need more context, more compute, more retrieval, more guardrails, more review, more policy, and more senior judgment just to keep the machine from confidently sanding off the load-bearing beams.

This is not a free productivity curve.

It is a debt curve.

The Cost Is Not Only Technical

There is also a broader cost.

If the answer to every increase in software complexity is a bigger model, longer context, more inference, more retries, more generated code, more generated tests, and more generated review, then we are not solving complexity. We are financing it.

That is economically unstable.

It is socially unstable too, because it changes what teams value. The person who ships the most generated code looks productive. The person who deletes a bad abstraction looks slow. The person who refuses a clever patch because it violates a hidden invariant looks obstructionist. The person who says “we need to understand this first” sounds old-fashioned in a room addicted to velocity.

And yes, it is environmentally unstable as well. Compute is not magic. The fact that the heat is somewhere else does not make it imaginary. I have one under my desk. You can feel that 1,000-watt heater, and it is not even an H100.

A workflow that creates complexity and then demands more energy to manage that complexity is not progress.

In game terms, that is dead-man-walking mode.

Borrowed Intelligence

The hardest thing to admit about AI coding is that it lets us borrow intelligence we have not mastered.

The model can produce code we could not have written as quickly. It can search patterns we would not have remembered. It can offer implementations before we fully understand the design pressure or socially agree on the right solution.

That is useful. It is also dangerous.

Borrowed intelligence is still borrowed.

And borrowed anything comes with matching interest.

If you use AI to explore, prototype, compare, generate scaffolding, or challenge your assumptions, it can be an extraordinary tool. But if you let it continuously deposit code faster than the team can absorb, simplify, and own, then you are no longer accelerating engineering.

You are importing obligations.

Every line of code is a future liability with a maturity date.

Every generated branch must be understood. Every generated abstraction must be justified. Every generated file must become part of the product’s epistemology or be removed. There is no neutral code. Code either carries truth or creates confusion about where truth lives.

AI makes it easy to forget that.

And the more ignorant you are, the less likely you are to notice.

A Falsifiable Claim

This is not mysticism. It is a prediction.

AI should perform better when the task has common patterns, local validation, visible feedback, and low state coupling.

AI should perform worse when the task has novel purpose, hidden state transitions, durable versus volatile state distinctions, causal ordering, graph consistency, and expensive validation.

If that prediction is wrong, then AI should be able to build a system like Band-Aider as easily as it builds a CRUD dashboard. It should preserve the state machine, the ownership model, the checkpoint model, the graph refresh model, and the causal flow without repeatedly adding escape hatches.

That is not what happened.

AI accelerated the early climb, then steepened the final approach. It made the project feel close before it was correct. It produced a shape that resembled the architecture, but the hardest part was not the shape. The hardest part was preserving the causal contract.

That is the asymptote.

The model can get near the system by imitating structure. But near is not enough when the cost of being slightly wrong is corrupted state.

For CRUD, near may be a usable prototype.

For UX, near may be a workable first draft.

For a system with durable state, near is like a conviction that went on record.

What Actually Remains Valuable

So no, coding is not dead.

The mechanical act of typing code may be less scarce. Boilerplate may be cheaper. First drafts may be cheaper. Exploration may be cheaper. But mature software engineering was never just typing.

Anyone can wire. Fewer can guarantee the crimps are not getting electroplated in a few months.

Coding has entered a similar phase. Everyone has a crimper now. Let’s see how many people keep their code from catching fire.

AI coding does not remove the need for senior engineers. But it does make it easier for senior engineers to overrun their own understanding.

That is the part the slogan leaves out.

A good engineer with AI can move faster. A great engineer with AI can explore more possibilities. But a mature product cannot absorb unlimited possibility. Your customers cannot. Your organization cannot build on quicksand.

Both need fewer agreed-upon truths, not more. They need sharper boundaries, not more fallback paths. They need code that compresses the system’s knowledge, not code that expands the number of stories the system might be telling.

AI is powerful because it generates possibilities.

Mature products and stateful systems survive by eliminating them.

That tension is the whole game.

The future of software is not “coding is dead.” It is not “taste is all that remains.” It is not “senior engineers are safe because judgment.”

The future belongs to teams that understand the bargain and pay it off monthly before it compounds.

Use AI to move faster, and it will lend you speed.

Use it without mastery, and it will lend you complexity.

The price arrives later.