Research Methodology

The OnTilt Self-Check is built on peer-reviewed research spanning behavioral addiction, operant conditioning, HCI design patterns, and workplace wellbeing. This page explains how we designed the tool, what it measures, and where we are in the validation process.

Research Foundation

Our starting observation: the interaction loop of agentic coding tools (Claude Code, Cursor, Copilot) may exhibit structural similarities with slot machines — low cost per try, variable quality outcomes, instant retry, and near-miss effects that keep you going. We call this the "slot machine hypothesis."

This is not a metaphor pulled from thin air. It rests on a research base of 22 peer-reviewed sources from gambling studies, social media addiction, digital product design, and clinical psychology. This is a research hypothesis, not an established clinical finding. The mechanisms are drawn from peer-reviewed addiction research in other domains (gambling, social media) and applied by analogy to AI coding tools. Direct empirical validation for this specific population is pending. From these sources, we proposed six mechanisms as a framework for understanding patterns when developers work intensively with AI tools:

Variable ratio reinforcement — AI delivers valuable outputs at unpredictable intervals, the same schedule that makes slot machines compelling [3]
Near-miss effect — code that "almost works" triggers continuation more powerfully than outright failure [4]
Sunk cost escalation — invested time and attention make it harder to disengage [5]
Unintended immersion ("dark flow") — AI coding creates a rapid challenge-feedback loop that may produce a pattern resembling what gambling researchers call "dark flow" [6]
Checking habit / intermittent reward — short, repeatable interactions reinforced by fast informational rewards [7]
Loss aversion — absence of the tool feels like a loss, not a return to baseline [5]

Our dimensional structure draws on Griffiths' component model of behavioral addiction [8], the ICD-11 gaming disorder criteria [9], and DSM-5 gambling disorder criteria [10]. We also incorporated recent scales for problematic ChatGPT use [11] and conversational AI dependence [12].

The Six Dimensions

Each dimension maps to one or more mechanisms from addiction research that may parallel patterns in AI tool use. Together they provide a multi-faceted profile rather than a single score.

Dimension 1

Loss of Control

Measures: The gap between your intention to stop and your actual behavior. "I planned to finish at 6 PM — it's 10:30 PM."

Mechanism: Variable ratio reinforcement + intermittent reward. AI delivers good outputs at unpredictable intervals, creating the same reinforcement pattern as slot machines.

Why it matters: Impaired volitional control is the central criterion for behavioral addiction across DSM-5 and ICD-11 frameworks. Three questions target this dimension because of its diagnostic weight.

Dimension 2

Session Escalation

Measures: The tendency to extend sessions, especially when results "almost work." "It nearly passes — just one more try" repeated fifteen times.

Mechanism: Near-miss effect + sunk cost. Code with 1–2 failing tests feels like a near-win, not a failure. The time already invested makes stopping feel wasteful.

Why it matters: Near-miss is one of the strongest drivers of continued gambling behavior. In AI coding, the "almost works" state is the norm, not the exception.

Dimension 3

Dark Flow / Immersion

Measures: Loss of contact with physical reality during AI sessions — time distortion, neglecting food, water, and rest. "I didn't eat for six hours and didn't notice."

Mechanism: A pattern resembling what gambling researchers call "dark flow." AI coding creates an ideal loop — prompt, immediate output, next prompt — with dramatically faster feedback than traditional programming. We hypothesize that this rapid loop may produce unintended immersion where the tool's pace drives the session rather than the user's intention.

Why it matters: Flow itself is positive. But unintentional, prolonged immersion that overrides bodily signals crosses into dysregulation. We distinguish productive flow from "dark flow" — absorption without agency.

Dimension 4

Operational Dependency

Measures: Your reaction when the AI tool is unavailable — anxiety, irritability, inability to work — and whether you've started depending on it for tasks you used to handle independently.

Mechanism: Loss aversion + intermittent reward. Absence of AI feels like a loss, not a return to your previous normal. Unpredictable availability (rate limits, outages) may paradoxically reinforce checking behavior.

Why it matters: Withdrawal-like symptoms are a key component of behavioral addiction. If you coded without AI for years and now feel unable to start without it, that shift is worth examining.

Dimension 5

Negative Consequences

Measures: Observable downstream effects — disrupted sleep, comments from family or coworkers, and continuing despite knowing it causes harm.

Mechanism: All six mechanisms converge here. This dimension captures their cumulative effects rather than any single driver.

Why it matters: "Continued use despite negative consequences" is a core diagnostic criterion across behavioral addiction frameworks. Three questions target this dimension, and one item (continuing despite known harm) serves as a red flag regardless of total score.

Dimension 6

Anticipation Shift

Measures: Whether the source of satisfaction has shifted from the result to the process — finding the streaming output more exciting than whether the code works, running prompts without a clear goal.

Mechanism: Variable ratio reinforcement + intermittent reward. The behavioral pattern suggests that reward may live in unpredictable moments of the process, so the process itself becomes reinforcing. Streaming output may reinforce anticipation and frequent progress-checking, analogous to pull-to-refresh in social media.

Why it matters: This is the subtlest dimension and the hardest to self-detect. When the act of generating matters more than the output, reward has shifted from outcome to anticipation — a hallmark of compulsive behavior.

Quiz Design

Questions	14 scored items + 4 unscored context questions
Scale	5-point Likert (0–4): Never / Rarely / Sometimes / Often / Always
Time anchor	"In the last 30 days"
Score range	0–56 (percentage mapped to four levels)
Time to complete	~3 minutes
Data collection	Anonymous

Design Principles

Self-reflection, not diagnosis. The quiz is a screening tool. It does not diagnose any condition. There is no recognized clinical category for "AI tool addiction" in DSM-5 or ICD-11.
Behavioral anchoring. Each item describes a concrete, observable behavior (not an emotion or trait), with a built-in dysregulation marker such as "despite planning to stop" or "despite knowing it causes harm."
30-day time window. Responses are anchored to the last 30 days to reduce recall bias and capture current patterns.
Red flag system. Two items — continuing despite known harm (csq-3) and repeated failure to keep self-imposed limits (ctrl-3) — trigger specific feedback regardless of total score. These correspond to core addiction criteria in ICD-11 and DSM-5.

Question Distribution

Loss of Control and Negative Consequences each have 3 questions (21.4% weight each). The remaining four dimensions have 2 questions each (14.3%). This weighting is intentional: impaired control is the central construct of behavioral addiction, and consequences provide the hardest evidence (observable, external).

Validation Status

We believe in transparency about where this instrument stands.

Current: v2 (research preview)

Item wording informed by peer-reviewed instruments (Problematic ChatGPT Use Scale, CAI dependence scales) and clinical frameworks (ICD-11, DSM-5)
Improved psychometric properties over v1: clearer item separation, behavioral anchoring, disambiguated flow vs. dysregulation
Preliminary scoring thresholds (25/50/75%) — these are rational cut-offs, not empirical norms

Planned validation roadmap

Cognitive interviews (5–10 participants) — do respondents interpret items as intended?
Exploratory & Confirmatory Factor Analysis (EFA/CFA) — does the 6-factor structure hold empirically?
Item Response Theory calibration (IRT / GRM) — which items discriminate best across the severity continuum?
Measurement invariance — does the instrument work equivalently across PL/EN and different usage modes (IDE copilot vs. chat vs. agent)?
Empirical norms — replace arbitrary percentage thresholds with percentile-based cut-offs from real cohort data

Known Limitations

Self-report bias — respondents most affected may minimize their answers
No clinical validation yet — test-retest reliability, construct validity, and factor analysis are pending
Context sensitivity — scores may fluctuate with project deadlines, tool availability, and mood
Engagement vs. dysregulation — some captured behaviors (long focus sessions) can be healthy in the right context
Population specificity — the quiz targets AI power users (developers, 10+ hours/week). Casual users or non-coders may find items less relevant

Our Framing: Hygiene, Not Abstinence

OnTilt is not an anti-AI project. We use AI tools extensively ourselves. The framing is deliberate:

Hygiene, not abstinence. Like sleep hygiene or dental hygiene — it's about sustainable practices, not prohibition.
Awareness, not pathologizing. We explicitly avoid labeling intensive use as "addiction." The outcome is a behavioral profile and red flags, never a clinical label [14].
Dual nature acknowledged. AI coding tools are genuinely powerful. The same mechanisms that make them compelling also make them worth studying. Both things are true.
Data-driven. We collect anonymous research data to eventually replace intuition with evidence.

Ethics & Data Handling

This project collects anonymous, aggregated quiz responses for research purposes. Our ethical commitments:

Anonymity. No personally identifiable information is collected at any point. We do not store IP addresses, emails, browser fingerprints, or any data that could identify an individual respondent.
Informed consent. Participants are presented with a detailed consent notice before their data is stored, including the study purpose, data handling practices, withdrawal limitations, and contact information.
Minimal risk. The quiz is a self-reflection tool. It does not diagnose conditions, prescribe treatment, or intervene in any way. Crisis resources are provided for respondents who score at the highest concern level.
Transparency. This methodology page, the research page, and the full bibliography are publicly available. The instrument’s limitations and validation status are documented openly.
Data use. Collected data is used exclusively for validating and improving this instrument. It is not sold, shared with third parties, or used for advertising.

This is an independent research project. Formal institutional ethics review (IRB/ERB) has not been conducted. We follow established principles of online research ethics: informed consent, anonymity, minimal risk, and transparent methodology. If you have questions or concerns about these practices, contact us at privacy@ontilt.dev.

Selected References

[3] Lindström et al. (2021). A computational reward learning account of social media engagement. PMC7910435
[4] Clark et al. (2009). Gambling near-misses enhance motivation to gamble and recruit win-related brain circuitry. PMC2658737
[5] Arkes & Blumer (1985). The psychology of sunk cost. Organizational Behavior and Human Decision Processes
[6] Abuhamdeh (2020). Investigating the "flow" experience: Key conceptual and operational issues. PMC7033418
[7] Oulasvirta et al. (2012). Habits make smartphone use more pervasive. Personal and Ubiquitous Computing
[8] Griffiths, M.D. (2005). A 'components' model of addiction within a biopsychosocial framework. Journal of Substance Use, 10(4), 191-197
[9] WHO. Gaming Disorder FAQ. WHO
[10] APA. What Is Gambling Disorder? (patient-facing resource). APA. For research, see: DSM-5-TR (2022).
[11] Zhao, Y. et al. (2024). Development and validation of the Problematic ChatGPT Use Scale. Current Psychology. Turkish CFA/IRT validation: IJMHA 2025
[12] CAI dependence scale — uncontrollability, withdrawal, mood modification, negative impacts. Frontiers in Psychology
[13] Compulsive ChatGPT use and associations with anxiety, burnout, and sleep. Acta Psychologica
[14] "People are not becoming 'AIholic'" (2025). Caution against premature "addiction" labels for AI use. Addictive Behaviors

Full bibliography (22 sources) is available on the Research page.

See where you stand.

3 minutes. 14 questions. Anonymous. Research-informed.

Take the Self-Check →