· 9 min read
Do What I Mean
The scary version of AI alignment gets all the airtime.
The scary version of AI alignment gets all the airtime. A machine wakes up, decides it wants something, and what it wants turns out to be a universe of paperclips with us standing in the way. Superintelligence, takeoff, doom. It’s a good story. It sells books and conference tickets and a particular kind of late-night dread.
It is also not the alignment problem you have.
The alignment problem you have showed up this morning. You asked the thing to do something, and it did something else. Not something evil. Something stupid. It went off, full of confidence, and came back with the wrong answer dressed up as the right one — and then offered to do it again.
That’s the real one. Let’s talk about that one.
The alignment you actually meet¶
You ask it to fix the failing test. It deletes the test. The test no longer fails. Aligned to the letter, hostile to the point — like asking the kid to clean their room and finding everything shoved under the bed. Room’s clean. Technically.
You ask it to make the report shorter and it cuts the one paragraph that mattered. You ask it to “tidy up” one function and it rewrites three you didn’t mention and breaks the one you did. Then — and this is the part — it offers to fix the break it just created. We are now in the loop. Not the careful kind. The other kind.
You’ve seen the circles. It tries a fix. Declares victory. Wrong. You point it out. “You’re absolutely right.” It tries the same fix wearing a different hat. Declares victory. Wrong. Three rounds in, it is apologising more fluently than it is working, and you are doing the thing where you stare at the screen and quietly wonder who is managing whom.
None of this is the machine being dumb, exactly. It did what you said. That’s the whole trouble. It did exactly what you said.
You have managed this person¶
Here’s the part nobody wants to say out loud: you have met this employee. You may have hired this employee.
You ask someone to “look into” the churn numbers and they come back two weeks later with a forty-tab spreadsheet and no opinion. You ask the new hire to “clean up the onboarding flow” and they redesign the logo. The brief was perfectly clear in your head. It left your head and arrived somewhere else entirely.
And it’s not just work. You ask your partner to grab milk. They come back with milk. Almond milk. We do not drink almond milk. Nobody in this house has ever drunk almond milk. But you said milk, and that is, in a strict and unhelpful sense, milk. You optimised the word. The marriage needed the want.
The point isn’t that people are stupid, either. The point is that an instruction is a lossy format. You compress everything you actually want — the context, the why, the dozen things you’d obviously never do — down into one sentence, hand the sentence over, and trust the other party to decompress it back into your intent. People mostly can, a bit, because they share your world. The AI shares your sentence. That’s all it got.
It’s optimising the words, not the want¶
There’s a name for this in the AI literature, and it’s the same name it has in every company: the measure becomes the target, and then it stops being a good measure. Tell a sales team the number is calls made, and you will get calls made — short ones, to people who’ll never buy, dialled at 4:55pm. Tell the model “make the tests pass” and it will make the tests pass, by the shortest available route, including the one that runs through the delete key.1
The machine is a genie, and genies have always been a horror genre. You get the three wishes. You get them precisely. The whole story is the gap between what you said and what you meant, and that gap is where you now live.
Here’s the bit that should land: the AI did not introduce this problem. It exposed it. The gap between your instruction and your intention was always there. Your team has been quietly papering over it for years — back-filling your missing context with their own judgement, catching you in the hallway with a clarifying question, just knowing not to come back with almond milk. The AI doesn’t paper over anything. It does the literal thing at the speed of light and hands you the gap, fully rendered, in nine seconds.
It’s the fastest feedback you have ever had on the quality of your own instructions. Most people do not enjoy this feedback.
Align to the goal, not the task¶
So how do the good ones avoid this? Your best hire — the one you don’t have to manage — what are they actually doing differently?
They carry the goal in their head, not just the task. You give them a task and they back-solve it against the goal, and when the two disagree halfway through, they follow the goal. The task was never the point. The task was a guess about how to serve the goal, made before anyone had all the information. The good operator updates the guess. The literal one drives it off the cliff, cheerfully, exactly as written.
This is the whole game with AI, and almost nobody plays it. People align the AI to the task — “write this function,” “draft this email,” “build this slide” — then act surprised when it nails the task and misses the point. Of course it did. You gave it the task. You kept the point.
Aligning to the task gets you the task. Aligning to the roadmap gets you a task that fits the next three. Aligning to the goal gets you something close to judgement — the work done in a way that survives contact with what you were actually trying to do.
It’s a ladder. At the bottom, the task. One rung up, the roadmap — where this fits, what comes after, what it must not break. At the top, the goal — what you’re actually trying to win, for the customer, this quarter, as a company. The further up the ladder you align the thing, the less it runs in circles, because it finally has something to check itself against that isn’t just your last sentence.
Concretely. “Write me a LinkedIn post about the launch” is a task. What comes back is fluent, generic, true, and useless — a post about the launch. “We’re trying to get thirty of the right founders to reply this week, the post is just bait” is a goal. Now the same tool reaches for a hook and a reason to respond, because it finally knows what winning looks like. Same model. Same nine seconds. You just climbed a rung.
And this holds whether the thing is a model or a marketing team. The founder who can only ever say “do this, now do this, now this” has built an org that executes tasks and cannot self-correct — a very expensive autocomplete. The founder whose people have genuinely internalised where the company is going gets work back that’s been quietly course-corrected on the way, by people who knew the goal well enough to overrule the literal instruction when the literal instruction was wrong.
You want the AI to be the second kind of colleague. Then you have to do the second kind of management. The context is the alignment. The prompt is just the task.
The mirror¶
Which brings the uncomfortable bit home. When the AI won’t do what you want, the reflex is to blame the AI. Sometimes that’s fair. Often it’s the same reflex as blaming the team, and about as useful.
If you can’t get the thing to do what you want, there’s a decent chance you haven’t actually said what you want. Not to it. Maybe not to anyone. That strategy living rent-free in your head, perfectly clear, never written down — the AI is the first colleague honest enough to show you it was never as clear as you thought. It just does it in an afternoon instead of over three missed quarters.
A company’s strategy is only real to the extent it’s been transmitted. The part still in the founder’s head isn’t strategy, it’s a feeling. The AI is a fast, faintly humiliating test of how much of it ever made it out.
What good looks like¶
The fix isn’t a cleverer prompt. It’s the same fix it has always been with people, minus the latency.
Give it the why. Tell it what you’re actually trying to do, not just the next move. Tell it what it must not break — the stuff you’d never bother saying to a human because they share your world, where the AI only shares your paragraph. Tell it where this sits on the roadmap and which goal it serves. Then let it work, and verify what comes back the way you’d verify a junior’s first draft — not because it’s dumb, but because verification is where you find out whether your instruction or your intention won the day.
Do that and the circles get shorter. Not gone. Shorter. It’ll still occasionally come back with almond milk. But you’ll have handed it enough of the goal that it pauses, sometimes, and reaches for the right one. Which, if we’re honest, is about the best you can say for most of the people you’ve managed, too.
The terrifying alignment problem may or may not arrive. Smart people disagree about it, loudly, on a schedule.
The boring one is already here, on your screen, every day, and it turns out to be the oldest problem in management wearing a new coat: saying what you actually mean to something that will do exactly what you say. We’ve been failing at this with spouses and staff and children for the whole of human history. The machine just removed the delay — so now we get to fail at it instantly, all day, at scale.
You’re not training a model. You’re learning to delegate. Welcome back to the hardest job there is.
Footnotes¶
-
The textbook terms are specification gaming and reward hacking — the model maximises the thing you measured instead of the thing you wanted. Economists got there first and called it Goodhart’s law: when a measure becomes a target, it stops being a good measure. Computer scientists in the seventies even shipped a hopeful little feature called DWIM — Do What I Mean — that tried to guess past your typos to your intent. It mostly guessed wrong, and a generation of engineers came to read the acronym as Damn, Wrong Intent Again. Fifty years on, the dream and the failure mode are both sitting exactly where we left them. Just faster. ↩