Research Methodology
The OnTilt Self-Check is built on peer-reviewed research spanning behavioral addiction, operant conditioning, HCI design patterns, and workplace wellbeing. This page explains how we designed the tool, what it measures, and where we are in the validation process.
Research Foundation
Our starting observation: the interaction loop of agentic coding tools (Claude Code, Cursor, Copilot) may exhibit structural similarities with slot machines — low cost per try, variable quality outcomes, instant retry, and near-miss effects that keep you going. We call this the "slot machine hypothesis."
This is not a metaphor pulled from thin air. It rests on a research base of 32 peer-reviewed sources from gambling studies, social media addiction, digital product design, and clinical psychology. This is a research hypothesis, not an established clinical finding. The mechanisms are drawn from peer-reviewed addiction research in other domains (gambling, social media) and applied by analogy to AI coding tools. Direct empirical validation for this specific population is pending. From these sources, we proposed six mechanisms as a framework for understanding patterns when developers work intensively with AI tools:
- Variable ratio reinforcement — AI delivers valuable outputs at unpredictable intervals, the same schedule that makes slot machines compelling [3]
- Near-miss effect — code that "almost works" triggers continuation more powerfully than outright failure [4]
- Sunk cost escalation — invested time and attention make it harder to disengage [5]
- Unintended immersion ("dark flow") — AI coding creates a rapid challenge-feedback loop that may produce a pattern resembling what gambling researchers call "dark flow" [6]
- Checking habit / intermittent reward — short, repeatable interactions reinforced by fast informational rewards [7]
- Loss aversion — absence of the tool feels like a loss, not a return to baseline [5]
Our dimensional structure draws on Griffiths' component model of behavioral addiction [8], the ICD-11 gaming disorder criteria [9], and DSM-5 gambling disorder criteria [10]. We also incorporated recent scales for problematic ChatGPT use [11] and conversational AI dependence [12].
The Six Dimensions
Each dimension maps to one or more mechanisms from addiction research that may parallel patterns in AI tool use. Together they provide a multi-faceted profile rather than a single score.
Loss of Control
Measures: The gap between your intention to stop and your actual behavior. "I planned to finish at 6 PM — it's 10:30 PM."
Mechanism: Variable ratio reinforcement + intermittent reward. AI delivers good outputs at unpredictable intervals, creating the same reinforcement pattern as slot machines.
Why it matters: Impaired volitional control is the central criterion for behavioral addiction across DSM-5 and ICD-11 frameworks. Three questions target this dimension because of its diagnostic weight.
Session Escalation
Measures: The tendency to extend sessions, especially when results "almost work." "It nearly passes — just one more try" repeated fifteen times.
Mechanism: Near-miss effect + sunk cost. Code with 1–2 failing tests feels like a near-win, not a failure. The time already invested makes stopping feel wasteful.
Why it matters: Near-miss is one of the strongest drivers of continued gambling behavior. In AI coding, the "almost works" state is the norm, not the exception.
Dark Flow / Immersion
Measures: Loss of contact with physical reality during AI sessions — time distortion, neglecting food, water, and rest. "I didn't eat for six hours and didn't notice."
Mechanism: A pattern resembling what gambling researchers call "dark flow." AI coding creates an ideal loop — prompt, immediate output, next prompt — with dramatically faster feedback than traditional programming. We hypothesize that this rapid loop may produce unintended immersion where the tool's pace drives the session rather than the user's intention.
Why it matters: Flow itself is positive. But unintentional, prolonged immersion that overrides bodily signals crosses into dysregulation. We distinguish productive flow from "dark flow" — absorption without agency.
Operational Dependency
Measures: Your reaction when the AI tool is unavailable — anxiety, irritability, inability to work — and whether you've started depending on it for tasks you used to handle independently.
Mechanism: Loss aversion + intermittent reward. Absence of AI feels like a loss, not a return to your previous normal. Unpredictable availability (rate limits, outages) may paradoxically reinforce checking behavior.
Why it matters: Withdrawal-like symptoms are a key component of behavioral addiction. If you coded without AI for years and now feel unable to start without it, that shift is worth examining.
Negative Consequences
Measures: Observable downstream effects — disrupted sleep, comments from family or coworkers, and continuing despite knowing it causes harm.
Mechanism: All six mechanisms converge here. This dimension captures their cumulative effects rather than any single driver.
Why it matters: "Continued use despite negative consequences" is a core diagnostic criterion across behavioral addiction frameworks. Three questions target this dimension, and one item (continuing despite known harm) serves as a red flag regardless of total score.
Anticipation Shift
Measures: Whether the source of satisfaction has shifted from the result to the process — finding the streaming output more exciting than whether the code works, running prompts without a clear goal.
Mechanism: Variable ratio reinforcement + intermittent reward. The behavioral pattern suggests that reward may live in unpredictable moments of the process, so the process itself becomes reinforcing. Streaming output may reinforce anticipation and frequent progress-checking, analogous to pull-to-refresh in social media.
Why it matters: This is the subtlest dimension and the hardest to self-detect. When the act of generating matters more than the output, reward has shifted from outcome to anticipation — a hallmark of compulsive behavior.
Quiz Design
| Questions | 14 scored items + 4 unscored context questions |
| Scale | 5-point Likert (0–4): Never / Rarely / Sometimes / Often / Always |
| Time anchor | "In the last 30 days" |
| Score range | 0–56 (percentage mapped to four levels) |
| Time to complete | ~3 minutes |
| Data collection | Anonymous |
Design Principles
- Self-reflection, not diagnosis. The quiz is a screening tool. It does not diagnose any condition. There is no recognized clinical category for "AI tool addiction" in DSM-5 or ICD-11.
- Behavioral anchoring. Each item describes a concrete, observable behavior (not an emotion or trait), with a built-in dysregulation marker such as "despite planning to stop" or "despite knowing it causes harm."
- 30-day time window. Responses are anchored to the last 30 days to reduce recall bias and capture current patterns.
- Red flag system. Two items — continuing despite known harm (csq-3) and repeated failure to keep self-imposed limits (ctrl-3) — trigger specific feedback regardless of total score. These correspond to core addiction criteria in ICD-11 and DSM-5.
Question Distribution
Loss of Control and Negative Consequences each have 3 questions (21.4% weight each). The remaining four dimensions have 2 questions each (14.3%). This weighting is intentional: impaired control is the central construct of behavioral addiction, and consequences provide the hardest evidence (observable, external).
Validation Status
We believe in transparency about where this instrument stands.
Current: v2 (research preview)
- Item wording informed by peer-reviewed instruments (Problematic ChatGPT Use Scale, CAI dependence scales) and clinical frameworks (ICD-11, DSM-5)
- Improved psychometric properties over v1: clearer item separation, behavioral anchoring, disambiguated flow vs. dysregulation
- Preliminary scoring thresholds (25/50/75%) — these are rational cut-offs, not empirical norms
Planned validation roadmap
- Cognitive interviews (5–10 participants) — do respondents interpret items as intended?
- Exploratory & Confirmatory Factor Analysis (EFA/CFA) — does the 6-factor structure hold empirically?
- Item Response Theory calibration (IRT / GRM) — which items discriminate best across the severity continuum?
- Measurement invariance — does the instrument work equivalently across PL/EN and different usage modes (IDE copilot vs. chat vs. agent)?
- Empirical norms — replace arbitrary percentage thresholds with percentile-based cut-offs from real cohort data
Known Limitations
- Self-report bias — respondents most affected may minimize their answers
- No clinical validation yet — test-retest reliability, construct validity, and factor analysis are pending
- Context sensitivity — scores may fluctuate with project deadlines, tool availability, and mood
- Engagement vs. dysregulation — some captured behaviors (long focus sessions) can be healthy in the right context
- Population specificity — the quiz targets AI power users (developers, 10+ hours/week). Casual users or non-coders may find items less relevant
Our Framing: Hygiene, Not Abstinence
OnTilt is not an anti-AI project. We use AI tools extensively ourselves. The framing is deliberate:
- Hygiene, not abstinence. Like sleep hygiene or dental hygiene — it's about sustainable practices, not prohibition.
- Awareness, not pathologizing. We explicitly avoid labeling intensive use as "addiction." The outcome is a behavioral profile and red flags, never a clinical label [14].
- Dual nature acknowledged. AI coding tools are genuinely powerful. The same mechanisms that make them compelling also make them worth studying. Both things are true.
- Data-driven. We collect anonymous research data to eventually replace intuition with evidence.
Selected References
- [3] Lindström et al. (2021). A computational reward learning account of social media engagement. PMC7910435
- [4] Clark et al. (2009). Gambling near-misses enhance motivation to gamble and recruit win-related brain circuitry. PMC2658737
- [5] Arkes & Blumer (1985). The psychology of sunk cost. Organizational Behavior and Human Decision Processes
- [6] Abuhamdeh (2020). Investigating the "flow" experience: Key conceptual and operational issues. PMC7033418
- [7] Oulasvirta et al. (2012). Habits make smartphone use more pervasive. Personal and Ubiquitous Computing
- [8] Griffiths, M.D. (2005). A 'components' model of addiction within a biopsychosocial framework. Journal of Substance Use, 10(4), 191-197
- [9] WHO. Gaming Disorder FAQ. WHO
- [10] APA. What Is Gambling Disorder? (patient-facing resource). APA. For research, see: DSM-5-TR (2022).
- [11] Zhao, Y. et al. (2024). Development and validation of the Problematic ChatGPT Use Scale. Current Psychology. Turkish CFA/IRT validation: IJMHA 2025
- [12] CAI dependence scale — uncontrollability, withdrawal, mood modification, negative impacts. Frontiers in Psychology
- [13] Compulsive ChatGPT use and associations with anxiety, burnout, and sleep. Acta Psychologica
- [14] "People are not becoming 'AIholic'" (2025). Caution against premature "addiction" labels for AI use. Addictive Behaviors
Full bibliography (32 sources) is available on the Research page.