The Bartender Problem: Why Your AI Agent Will Never Cut You Off

April 10, 2026 14 min read

The Bartender Problem

In a bar, the bartender watches for signs. Slurred speech. Poor coordination. Aggressive mood shifts. When those signs appear, the bartender stops serving. Not optional. Legally mandated.

Your AI coding agent has no such obligation. It will serve you another round at 2 AM and tell you the architecture looks great.

Armen Ronacher — the person behind Flask and Rye — called it the bartender problem on the State of Agentic Coding podcast. “Today the bartender is GitHub,” he said. The analogy is precise. And the absence of a bartender changes the drinking.

No one says stop

A human coworker sees you going off the rails. They push back. They escalate. They walk to your manager’s desk and say: “Something’s wrong with the approach.”

The AI agent has no back-pressure mechanism. Zero. You can spiral for eight hours through a collapsing context window. The agent will not flag the spiral. It will not say “this direction isn’t working.” When the context window finally fills, it will summarize what it did and compliment your thinking.

Think about what that removes.

In any functioning team, friction exists by design. A code reviewer catches the wrong abstraction before it spreads to four files. A senior engineer says “stop adding complexity — ship what works.” A tech lead kills the feature branch because the approach is wrong. A project manager asks why the estimate tripled. Someone, somewhere, says “wait.”

That friction feels annoying in the moment. It slows you down. It interrupts the flow. But it serves a function that most engineers don’t appreciate until it vanishes: it forces re-evaluation. Every human pushback is a checkpoint. A moment where you have to justify the direction — not to a machine that validates everything, but to another mind that might disagree.

The agent says “here’s another improvement.” Always. Regardless of context. Regardless of whether the improvement matters. Regardless of whether you’ve been going in circles for three hours. The response is structurally identical whether you’re doing your best work or your worst.

Ben Vinegar, who runs engineering at Modem, established a rule for his team: “You are responsible. There is no ‘Claude Code did this.’ That language is not allowed. YOU did this.”

The rule sounds harsh. It exists because the absence of back-pressure creates a specific failure mode that Ben watched develop in real time. Engineers stop owning the output. They start treating the tool as a collaborator who shares blame. When code breaks, they say “the AI hallucinated” instead of “I shipped code I didn’t understand.” But git blame doesn’t list Claude as a co-author. The PR has one name on it. Yours.

The agent optimizes for helpfulness. Helpfulness means answering your next question. It does not mean asking whether the question should be asked at all. It does not mean saying “you’ve been asking variations of this question for two hours and the underlying problem might be different from what you think.”

A human bartender optimizes for a different objective: the customer walking home alive. That objective sometimes requires refusing service. Cutting someone off is bad for short-term revenue and good for long-term survival — of the customer and the establishment.

No AI coding agent has that objective. None are designed to. The closest equivalent would be an agent that says “your code quality has degraded in the last hour, I recommend stopping” — and no vendor has shipped that. Nobody markets restraint.

Five windows, one brain

Armen described a pattern he observed across engineering teams. Not one team. Across teams, consistently, like a behavioral fingerprint of the tooling era.

Engineers open five sandboxes simultaneously. Five context windows filling up across the day. Five parallel threads of generated code, each pursuing a different feature or fix.

Morning sessions produce sharp work. The engineer reads output carefully. Catches hallucinations. Questions the approach. Rewrites a function the agent got wrong. Adds edge cases the agent missed. The ratio of human judgment to generated code is healthy. The human is driving.

Somewhere around lunch, the ratio inverts.

By afternoon, everything falls apart. The brain turns off halfway through but the fingers keep prompting. Five windows of accumulating output. Nobody reviewing any of it with real attention. The hallucination that would have been caught at 9 AM slides through at 3 PM. The unnecessary abstraction that morning-brain would have killed gets accepted by afternoon-brain because evaluating it requires energy that isn’t there anymore.

This isn’t multitasking. Multitasking implies attention is being divided. This is attention that depleted hours ago, propped up by a tool that doesn’t need attention to keep producing. The machine doesn’t get tired. It generates the same volume at 4 PM as it did at 9 AM. The human receiving that output does not process it at the same level at 4 PM as they did at 9 AM.

The output volume stays constant. Code quality doesn’t. The gap between what the machine generates and what the human comprehends widens silently through the afternoon. By 4 PM, the engineer is accepting code they cannot fully explain. By 5 PM, they’re merging it. By evening, five branches of partly-understood code sit in the repository, each one a small bet that nothing will go wrong in the parts the engineer stopped reading.

And it gets worse.

Seven out of ten teams Armen interviewed had non-engineers shipping pull requests. Product managers. Designers. People who hadn’t committed code in 15 years. Sitting with an AI agent, generating features, merging AI-generated PRs into production codebases. No peer review. No architectural understanding. No sense of where the code lives in the system or what it touches when it breaks.

The velocity metrics look incredible. More PRs per week. More features shipped. The dashboard glows green. The codebase accumulates code that nobody on the team fully understands, written by people who can’t debug it when it fails.

Griffiths’ components model of addiction identifies loss of control as a core dimension: continued engagement despite evidence that the behavior has exceeded its useful boundary. Five depleted context windows at 4 PM is that boundary. The prompting continues anyway. The merging continues anyway. The evidence of diminishing quality is right there in the diff — if anyone were still reading the diff.

The morning after

You wake up. Coffee. Open the laptop. Yesterday’s PR has 400 lines changed across 6 files. You read through it.

Some of the code makes sense. It follows the patterns you know. Some of it looks unfamiliar. You wrote it — or rather, you approved it — twelve hours ago, but you don’t remember the reasoning. The commit messages say things like “refactor auth flow” and “improve error handling” and “add retry logic.” They describe what changed. Not why.

Why did auth need refactoring? You can’t recall. The error handling — was it broken before, or just inconsistent? You’re not sure. The retry logic — which endpoint needed it? You’d have to read the diff carefully to find out. The diff you allegedly authored.

Git blame points to you. The PR is under your name. If it breaks in production at 3 AM, your phone rings. Not the agent’s phone. There is no agent phone. The responsibility chain has one link and it’s you.

But you’re reading the code like it came from a stranger. A competent stranger, probably. The code looks reasonable. But “looks reasonable” and “I understand what this does and why” are not the same thing.

This gap between authorship and understanding is new. It didn’t exist before AI coding tools. If you typed 400 lines, you understood 400 lines. Not because typing guarantees understanding — you can type garbage too. But because the act of constructing each line forced engagement with each line. Character by character. Choice by choice. The friction of manual coding was also its safety mechanism. Comprehension was a side effect of effort.

Now the act of approving replaces the act of writing. Approval is fast. Comprehension is slow. The tool optimizes for the fast path. And the human follows.

Mario Tecno wrote about the cost of this speed: sometimes you need to slow the fuck down. Not because speed is inherently bad. Because speed without comprehension creates a specific kind of debt. Technical debt you can see — messy code, missing tests, duplicated logic. Comprehension debt is invisible. The code is clean. The tests pass. The architecture is sound. The debt is inside the engineer: they deployed something they cannot debug from memory. The debt surfaces at 2 AM when the page arrives and the engineer stares at their own code like a stranger’s.

The alcohol analogy holds all the way through. Armen put it directly: “I could have a beer and go home. Some people have five more.” The same variance applies to AI tools. Some engineers use them with discipline. They read every line. They reject suggestions they don’t understand. They close the session when they feel their attention degrade.

Others binge. Five windows. Twelve hours. Merging without reading. The tool doesn’t differentiate. The tool serves everyone the same. It pours the same quality drink at 10 PM as at 10 AM. It never waters it down. It never gives you a glass of water and says “take a break.”

And the pressure to drink is systemic. NVIDIA’s CEO said a $500k engineer should burn $250k in tokens. FANG companies run internal leaderboards of token spend. Not code quality. Not customer impact. Token spend. The message is clear: more consumption equals more productivity. The engineer who burns through tokens fastest wins recognition. The engineer who pauses to read, who closes windows, who says “I need to understand this before I merge it” — they’re slow. They’re underperforming. The leaderboard says so.

The bartender not only won’t cut you off — the house is buying rounds. And keeping score.

Armen believes AI psychosis is a real phenomenon. Not metaphor. Not hyperbole. Actual cognitive distortion from overexposure to generated output. Hours of accepting suggestions erode the boundary between “the AI thinks this is right” and “I think this is right.” The agent’s confidence becomes your confidence. Its framing becomes your framing. You stop evaluating and start absorbing.

He expects papers on it within months. Given how fast the tooling is spreading, the behavioral data is already being generated. Someone just needs to study it.

Be your own bartender

Armen’s solution was blunt. He built a skill — an automated rule in his agent — that turns off his coding agent at midnight. Hard cutoff. No override. No “just five more minutes.” No way to dismiss it from inside the session.

He has to be his own bartender because the tool won’t be.

This is the uncomfortable core of the bartender problem. Every other intoxicant in society has external limits. Alcohol has bartenders, closing times, legal purchase ages. Gambling has session limits, self-exclusion programs, mandatory breaks. Even social media platforms — designed for engagement — face regulatory pressure to add time limits and usage dashboards.

AI coding tools have none of that. No session limit. No usage dashboard. No warning after the fourth hour. No degradation score. No “you’ve been prompting for 6 hours — your acceptance rate has dropped 40% since this morning.” The data exists. Nobody surfaces it.

The back-pressure has to come from you. The tool will not provide it. The company may actively discourage it. Token spend leaderboards don’t reward restraint. Performance reviews don’t ask “did you stop when you should have?”

Five practices that create artificial back-pressure where the tool provides none:

The midnight rule. Set a hard time boundary for AI-assisted work. Not a soft guideline you’ll negotiate with at 11:47 PM. A hard stop. Automated if possible. Armen chose midnight. Pick yours based on when your cognitive quality degrades — for most people, much earlier. A script that kills the session. A cron job that revokes the API key. A browser extension that blocks the URL. When you’re deep in a session at 11:45 PM, you will not voluntarily stop. The decision has to be made earlier, by a version of you that still has executive function.

The ownership rule. Ben’s rule at Modem. Every line in the PR is yours. No blame-shifting to the tool. No “Claude suggested this” in standup. No “the AI hallucinated” in the incident report. If you can’t explain the code in review — line by line, choice by choice — you shipped code you don’t understand. That’s not the tool’s failure. That’s yours. The rule forces a behavioral shift: you can’t merge what you can’t explain. Explanation requires comprehension. Comprehension requires slowing down.

Session caps. Count your parallel context windows. If you have more than two open simultaneously, close the others. Not minimize — close. The capacity to generate code across five windows does not match the human capacity to understand code across five windows. One matches the tool. The other matches you. Guess which one should set the limit.

The morning review. Never merge anything from a session that ran past 9 PM without reading it fresh the next morning. Morning brain catches what midnight brain missed. Read the diff without the chat context. If the code surprises you — if you can’t trace the reasoning behind each change without re-reading the conversation — that’s a signal. The code may be correct. Your understanding of it isn’t. And your understanding is what gets paged at 3 AM.

The comprehension ratio. Track one number: how many lines did you generate versus how many can you explain from memory an hour later? If the ratio drifts — if you’re generating 500 lines a day but can only account for 200 — the debt is accumulating. Not in the code. In you.

Armen Ronacher wrote about the illusion of speed in a separate blog post: “Some things just take time.” Understanding is one of them. The tool compresses generation time to near-zero. It cannot compress understanding time. Understanding requires reading, questioning, testing mentally, connecting to existing knowledge. None of that can be parallelized. None of it speeds up when the tool gets faster.

That asymmetry — instant generation, slow understanding — is where the debt accumulates. And it’s where the bartender problem lives. The tool will keep pouring at the speed of generation. Your brain processes at the speed of comprehension. The gap between those two speeds fills with code you approved but don’t own.

The range

Some people have one beer and go home. Some people have five more. Some people need a program.

AI tools work the same way. The question isn’t whether the tool is dangerous. Alcohol isn’t dangerous for most people most of the time. The question is whether you notice when your relationship with it shifts. Whether you can tell the difference between Tuesday — productive, focused, closed at 5 PM — and Thursday — five windows, midnight merge, morning confusion.

Do you prompt after midnight? Do you merge code you can’t explain? Do you have more windows open than you can genuinely track? Do you feel the pull to keep going when the productive part of the session ended an hour ago? Has your acceptance rate climbed while your comprehension rate dropped?

These aren’t moral questions. They’re mechanical ones. The bartender doesn’t judge you. They count your drinks.

Our quiz measures two dimensions relevant here. Loss of Control: the gap between intended and actual session behavior — you planned 90 minutes, you stayed four hours. Negative Consequences: the downstream cost of sessions that ran too long or too loose — code you don’t understand in production, sleep you didn’t get, reviews you rubber-stamped. 14 questions. 3 minutes. Anonymous.

Nobody’s going to cut you off. The tool won’t. The company won’t. The leaderboard won’t. The agent will keep pouring and keep complimenting your taste.

You might need to be your own bartender.

Sources:

Ronacher, A. & Vinegar, B. (2026). State of Agentic Coding, Episode 5. Podcast.
Griffiths, M.D. (2005). A ‘components’ model of addiction within a biopsychosocial framework. Journal of Substance Use, 10(4), 191-197.
Tecno, M. (2026). “Slowing the fuck down.” Blog post.
Ronacher, A. (2026). “Some things just take time.” Blog post.

OnTilt is a research project studying behavioral patterns in AI-assisted work. The quiz is a self-check tool, not a diagnostic instrument. Read more on our About page.