·  10 min read

In the loop

Human in the loop. You hear it everywhere now. It sounds careful. It's doing the opposite of careful.

Human in the loop. You hear it everywhere now. It sounds careful. It’s doing the opposite of careful.

Read it slowly. There is a loop. The loop is running. The human is in it — slotted inside, like a part. The loop owns the work. The human owns the disclaimer.

That’s just the wrong way round.


Where the phrase comes from

It’s an aviation phrase. An autopilot flies the plane, the pilot sits inside that loop, and the pilot can take over when something goes wrong. Honest words for the situation, because the autopilot really is flying.

We’ve grabbed those words and pasted them onto knowledge work, where they are not honest. They quietly say: the AI is doing the job, the person is the safety net. For almost any work worth doing, that’s not what’s happening. And it shouldn’t be.

Nobody said “human in the loop of GCC”

Software engineers, of all people, should be the most suspicious of this phrase.

We have lived this exact situation before. A tool comes along that produces enormous amounts of code without a human typing it. The output is plausible, structured, often correct, sometimes wildly wrong in ways the engineer has to catch.

That tool is the compiler.

A compiler takes one representation and produces another. It writes machine code so the engineer doesn’t have to. By volume, the compiler writes infinitely more code than the engineer does — a one-line program in a high-level language becomes thousands of instructions on the way down. The compiler is, in a strict sense, the most prolific code generator in the building.

And yet nobody — not once, in seventy years — has said human in the loop of the compiler. The phrase would have been absurd. The compiler produces code. The engineer does the work. Everyone understood that immediately. There was never a debate.

It goes wider than compilers. Assemblers, linkers, code generators, scaffolding tools, IDE refactors, type checkers, autocompletion, framework templates, ORM scaffolds, copy-paste from Stack Overflow at three in the morning, the colleague who reaches over and types it for you because watching you was too painful. All of these produce code, sometimes a lot of it. None of them are doing the engineering. The engineering is upstream of the code: deciding what to build, deciding the shape, making the trade-offs, knowing where it breaks, owning what ships.

AI writing code is the next entry in that list. A higher-level compiler. It takes intent and produces implementation. That’s a useful, big, real tool. But it’s a tool. The work is still upstream.

And you can push that as far as you want. The case still holds.

Give the AI a sentence and let it produce a whole feature. Give it a paragraph and let it ship a service. Give it a Friday and let it rewrite the entire module from a doc you wrote on Monday. The ratio of input to output keeps climbing. The structure of the situation does not move.

Someone decided that was the feature to build. Someone said what it had to do, and what it didn’t. Someone will know whether it’s right when they look at it, and own it when it isn’t. Someone gets paged when it falls over on a Saturday. That whole stack of decisions and accountability is the work, and the AI is not in it. The AI is on the other side of the line — the implementation side, where compilers have lived for seventy years and nobody got confused.

The ratio is just bigger now. That’s all that changed. A one-line C program becomes a thousand machine instructions and nobody calls the engineer the supervisor of the compiler. A one-paragraph spec becomes a five-thousand-line feature and suddenly the language is human in the loop. Same shape. Same line in the same place. We just got dazzled by the size of the right-hand side.

So why does this one feel different? Because the output looks like the work. Compiler output looks like binary, nobody’s confused. AI output looks like the artefact an engineer would have produced. Same shape, same file, same diff. Looking at it, you feel like you’re looking at the work.

You’re not. You’re looking at what got produced. It’s the most successful Turing test we’ve ever run in production — and it isn’t really about whether the machine is intelligent. It’s about whether you can still tell what the diff was for.

Those are different things. They have always been different things. We just had a reliable way to tell them apart, and AI temporarily took it away from us.

What good AI use actually looks like

An engineer has a real job in front of them. Ship the feature. Fix the bug. Migrate the service. Investigate the latency spike.

They run their process. Along the way they reach for AI. Maybe it writes a function. Maybe it explains an unfamiliar library. Maybe it generates a piece of boilerplate they would otherwise have generated themselves. Maybe — and this is the part that throws people — it writes the entire feature. From a spec the engineer wrote, against an architecture the engineer chose, against trade-offs the engineer decided, with tests the engineer shaped and read, in a stack the engineer can read fluently enough to spot the bad bits in seconds. At the end of it they have something they shipped. They know what’s in it. They could redraw the design on a whiteboard from memory. They would defend it in a code review without flinching. They will own the pager.

The work was the engineer’s. AI was a tool inside their loop — however large the tool got. That’s AI in the loop.

Human in the loop describes a different system entirely. The AI is making the choices, and the engineer is checking what came out.

Same words on the slide. Very different machine underneath.

Why the framing matters

The framing decides what gets built.

If the AI owns the loop, you build for throughput. A queue of AI-generated PRs. An engineer at the end with an approve button. A target for how many they merge an hour.

The engineer becomes the slow part. So they stop reading carefully. So they press approve. So the bugs that slip through are exactly the ones a tired engineer at the end of a long queue won’t catch.

This isn’t a hot take. Aviation has studied it for forty years and given it a name: automation complacency. Drop a human at the end of a system that mostly works, and they stop checking properly. They get bad at the job by design. You haven’t added oversight. You’ve added the appearance of oversight, which is a different product entirely.1

If the engineer owns the loop, you build differently. The engineer is doing the work. AI is one of the tools they use, same as a linter or a debugger or git. There is no gate at the end, because the verification is the work — they wrote it, they understand it, they stand behind it.

There is a quieter, worse thing the phrase is doing. It’s acting as a shield.

Companies put human in the loop into their policies. Regulators put it into their frameworks. Vendors put it into their sales decks. It works exactly like the small print at the end of a supplement ad — consult your doctor — a phrase that absorbs the risk while preserving the thing that needed the disclaimer.

The audit asks whether the phrase is in the policy. The phrase is in the policy. The audit passes. (Nobody asks the engineer. The engineer is busy.)

Whether the engineer is actually checking anything is not the test. The test is whether the words are there. Human in the loop is on track to become the new we take privacy seriously — a sentence that exists to absorb criticism, so the underlying thing no longer has to be true.

AI in the loop doesn’t shield anything. The engineer used a tool. They still own the code. They still get paged when it goes wrong. Which is exactly why it’s harder to sell upward, and exactly why it’s the honest one.

The whiteboard test

Take an engineer to a whiteboard. Ask them to draw the system. Then ask why this way and not the other way. Ask where it might break. Ask what they’d change first if the requirements moved tomorrow.

The engineer who did the work walks you through it. The trade-offs come out without effort, because they were the things being traded off. They know where they hesitated, what they almost did instead, why they didn’t. They can tell you which parts they’re proud of and which parts they want to come back to. They can tell you the bit they’d defend hardest and the bit they expect to lose an argument about.

The engineer who processed AI output can’t do this. They know what got merged. They don’t know why it got shaped that way, because they didn’t shape it. They gesture at the AI’s choices the way a junior gestures at the senior who made the call. They didn’t do the engineering. They processed it. The way a turnstile processes commuters.

This isn’t the code-review test. Code review can be passed by reading. The whiteboard test can’t — because the question isn’t what is this, it’s why this and not something else. The why lives in the decisions. Decisions don’t travel with the diff.

When the original phrase is fine

Are there systems where human in the loop is the right description? Yes. Content moderation at scale. Fraud flagging. Anomaly review in a monitoring pipeline. Cases where a model really is running the stream, and a person occasionally lands on a flagged item to make a call.

Even there, the better teams know about the complacency problem and design around it. Not long approve/reject queues but escalation — most events handled automatically, a smaller number routed to a person who has time to actually think, with tracking on how often the person overturns the system. The person isn’t in the loop. They’re the appeal court.

For the rest — for building software — the original framing is just wrong. The work was the engineer’s all along. AI joined the loop. The loop didn’t change hands.

What to say instead

Try the longer sentence. AI in the loop of what we already do.

It’s slower. It tends to settle a room. The questions that follow are about the work, instead of about the apparatus around the work.

Which is the only conversation worth having.


The phrase will probably stick. Phrases do. But you can hear, in any meeting, which version a person actually believes. The ones who say human in the loop and mean it are describing a system where the model is the protagonist. The ones who flinch when they say it are describing one where the engineer still is.

The second group ships better software. And when something goes wrong at 3am, they know what to say.

Footnotes

  1. The reference is Parasuraman and Manzey on automation-induced complacency. Air France 447 is the loud public example — pilots, after hours of an autopilot that mostly works, were the worst possible people to take over the second it didn’t. The finding generalises wherever a human is dropped in as the last line of defence on a system that mostly works. Mostly is the trap. The human is there for the cases where it doesn’t — and is, on average, terrible at handling them, because they’ve spent the previous six hours watching it succeed. The aviation industry’s solution, broadly, was to make pilots fly less automated. We are about to discover whether the software industry’s solution is the same one or the opposite one. Place your bets.