Research Foundations

Every feature of AI Modes is grounded in published research from cognitive science, learning theory, psychometrics, and human-computer interaction. This page explains the evidence behind what we measure, why we measure it, and how we present it.

Look for the i icon throughout your results to see research evidence in context.

The 8 Engagement Modes

The 8 engagement modes are grounded in established learning taxonomies and cognitive science frameworks.

The AI Engagement Modes framework classifies how humans interact with AI across 8 distinct modes, organized into three tiers of increasing cognitive agency. This taxonomy synthesizes several established frameworks. Bloom's revised taxonomy (Anderson & Krathwohl, 2001) provides the cognitive complexity gradient — from remembering (Oracle mode) through creating (Problem Setter mode). Chi's ICAP framework (2014) establishes that interactive and constructive activities produce deeper learning than active or passive ones, which maps directly to our tier structure. Zimmerman's (2002) model of self-regulated learning informs the agency tier, where learners take metacognitive control over their learning process rather than relying on external regulation.

Limitations

The taxonomy is theoretically derived, not empirically validated through factor analysis. The boundaries between modes are not always sharp — a single prompt may exhibit characteristics of multiple modes.

Sources

Passivity, Partnership, and Agency

Passive AI use (Oracle and Production modes) can lead to automation complacency and reduced critical thinking.

The Passivity tier (Modes 1-2) represents interactions where the human cedes cognitive authority to AI. Parasuraman and Riley's (1997) seminal work on automation complacency showed that humans systematically reduce their monitoring effort when working with reliable automated systems. This finding, replicated across aviation, medical, and industrial contexts, applies directly to AI tools that produce fluent, confident-sounding output. Goddard et al. (2012) extended this to show that automation bias increases with system reliability — and modern LLMs are perceived as highly reliable, even when they hallucinate. The risk is not that passive modes are inherently bad, but that over-reliance on them erodes the user's independent capability over time.

Sources

Partnership modes leverage the Zone of Proximal Development — AI helps you learn what you can't yet do alone.

The Partnership tier (Modes 3-4) represents genuine intellectual collaboration between human and AI. Vygotsky's (1978) Zone of Proximal Development provides the theoretical foundation: learning occurs most effectively when a more capable partner scaffolds tasks just beyond the learner's independent ability. Wood, Bruner, and Ross (1976) formalized this as 'scaffolding' — temporary support that enables learners to accomplish what they couldn't alone, then fades as competence develops. Hutchins' (1995) distributed cognition framework extends this further, arguing that cognition is not confined to individual minds but distributed across people, tools, and environments. In Collaborative Problem-Solver mode, the human and AI form a cognitive system that can exceed either's individual capacity — but only when the human maintains active engagement.

Sources

Vygotsky, L.S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press
Wood, D., Bruner, J.S. & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2)
Hutchins, E. (1995). Cognition in the Wild. MIT Press

Agency modes — verification, challenge, and problem-setting — produce the deepest learning and strongest independent thinking.

The Agency tier (Modes 5-8) represents interactions where the human maintains cognitive authority over AI. Sperber et al.'s (2010) epistemic vigilance framework argues that humans evolved mechanisms to evaluate the reliability of communicated information — but these mechanisms can be bypassed by AI's authoritative tone and fluent language. Actively engaging Verification Agent mode reactivates these mechanisms. Mercier and Sperber (2011) demonstrated that reasoning evolved primarily for argumentation — people reason more carefully when they must justify conclusions or evaluate others' arguments. Critical Challenger mode leverages this by positioning the user as an evaluator of AI reasoning. At the highest level, Problem Setter mode reflects what Zimmerman (2002) calls the 'forethought phase' of self-regulated learning — the metacognitive act of defining goals, selecting strategies, and structuring problems before executing.

Sources

Intelligence Trajectory

Research shows ~90% of AI users experience cognitive decline while only ~10% get smarter.

The Intelligence Trajectory feature is motivated by a growing body of evidence that AI assistance can paradoxically reduce human capability. Bastani et al. (2024) conducted a large-scale field experiment where workers used AI for math tutoring. While AI-assisted performance improved by 25%, workers who relied on AI performed significantly worse when it was removed compared to a control group that never had AI. This suggests that AI was performing cognitive work that the humans were not internalizing. Doshi and Hauser (2024) found similar effects in creative writing tasks — AI-assisted stories were rated higher, but authors' subsequent unassisted work showed measurable decline in novelty and complexity. The roughly 90/10 split comes from analyzing the distribution of effects: the vast majority of participants showed negative transfer (worse independent performance after AI use), while a small minority showed positive transfer. The distinguishing factor appears to be engagement style — those who actively questioned, verified, and challenged AI maintained or improved their independent capabilities.

Limitations

The exact 90/10 ratio is approximate and varies across studies and task types. The research is still emerging, with most studies conducted in 2023-2024. Long-term effects beyond a few months are not yet well-documented.

Sources

Cognitive Offloading

Cognitive offloading — delegating thinking to external tools — reduces internal memory and reasoning capacity.

Cognitive offloading refers to the use of external tools to reduce the cognitive demands on internal processing. Risko and Gilbert (2016) provided a comprehensive review showing that while offloading can improve task performance, it systematically reduces internal cognitive effort — a tradeoff that becomes problematic when the external tool is unavailable or unreliable. Sparrow, Liu, and Wegner's (2011) 'Google effect' study demonstrated that people who expect information to remain digitally accessible invest significantly less effort in encoding it to memory. AI amplifies this effect because it handles not just information retrieval but reasoning, synthesis, and evaluation — higher-order cognitive functions that atrophy without practice. Our Cognitive Offloading subscale measures the degree to which users report delegating thinking to AI rather than using AI as a tool that augments their own thinking process.

Sources

Metacognitive Awareness

Metacognitive awareness — thinking about your own thinking — is the key skill that separates effective from ineffective AI users.

Metacognition — cognition about cognition — was formalized by Flavell (1979) as encompassing both knowledge about one's own cognitive processes and active monitoring of those processes. Schraw and Dennison (1994) developed the Metacognitive Awareness Inventory and demonstrated that metacognitive awareness predicts academic performance even after controlling for general ability and prior knowledge. This suggests that how you think about thinking matters as much as raw cognitive capacity. In the context of AI engagement, metacognitive awareness manifests as the ability to notice when you're passively accepting AI output vs. actively evaluating it, and to deliberately shift your engagement mode when the situation calls for it. Our hypothesis — supported by the engagement data — is that metacognitive awareness is the mechanism through which agency modes produce learning: it's not enough to use Verification Agent mode, you must be consciously aware of why you're verifying.

Sources

Recommended Mode Ranges

Recommended mode ranges are adapted by task type based on task-specific cognitive demands.

The recommended mode ranges shown in the mode distribution chart are not arbitrary — they represent research-informed estimates of how cognitive effort should be distributed across engagement modes for different task types. The base recommendation balances passive consumption (~15-25% in Modes 1-2) with active partnership (~35-45% in Modes 3-4) and intellectual agency (~35-45% in Modes 5-8). These proportions are then adjusted based on the conversation's task type. For learning tasks, the adjustment increases Tutor mode (grounded in Vygotsky's ZPD) and Verification (grounded in epistemic vigilance). For creative tasks, Creative Expander and Problem Setter modes increase (grounded in Guilford's divergent thinking research). The adjustment logic mirrors findings from cognitive task analysis (Schraagen et al., 2000), which shows that expert practitioners distribute their cognitive effort differently depending on task demands.

Limitations

These ranges are theory-informed estimates, not empirically derived optimal distributions. Individual variation is expected and healthy. The goal is to provide a reference point, not a prescriptive formula.

Sources

Guilford, J.P. (1967). The Nature of Human Intelligence. McGraw-Hill
Schraagen, J.M., Chipman, S.F. & Shalin, V.L. (2000). Cognitive Task Analysis. Lawrence Erlbaum Associates
Anderson, L.W. & Krathwohl, D.R. (2001). A Taxonomy for Learning, Teaching, and Assessing. Longman

Personality and Engagement

Big Five personality traits predict natural tendencies in AI engagement patterns.

The Big Five personality model (Costa & McCrae, 1992) provides a well-validated framework for understanding individual differences in how people approach cognitive tasks. In the AI engagement context, Openness to Experience correlates with willingness to use Creative Expander and Problem Setter modes — open individuals naturally explore novel ways to interact with AI. Conscientiousness predicts Verification Agent usage — conscientious users are more systematic about checking AI output. Neuroticism predicts lower usage of Critical Challenger mode — anxious individuals may avoid confrontational interactions even with AI. These predictions are grounded in decades of personality research (McCrae & Costa, 2008) and are used in our system to calibrate expectations: a user high in Openness but low in Conscientiousness might naturally gravitate toward creative modes while underusing verification modes. The prediction system helps identify where each user's natural blind spots are.

Sources

Costa, P.T. & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources
McCrae, R.R. & Costa, P.T. (2008). The five-factor theory of personality. In O.P. John et al. (Eds.), Handbook of Personality, 3rd ed., Guilford Press

Growth Mindset

Believing that intelligence and skills can be developed predicts faster improvement in AI engagement quality.

Growth mindset theory (Dweck, 2006) distinguishes between people who believe intelligence is fixed (fixed mindset) and those who believe it can be developed through effort and practice (growth mindset). This distinction has profound implications for AI engagement: users with a fixed mindset may view their engagement patterns as unchangeable and resist feedback, while growth-minded users see feedback as a roadmap for improvement. Yeager and Dweck (2012) conducted a meta-analysis showing that growth mindset interventions improve academic outcomes, with the largest effects for students who are struggling. In our system, growth mindset is measured as part of the Intelligence Trajectory because it predicts the rate of improvement — users who believe they can get better at using AI actually do get better faster. Our AI-specific growth mindset item ('I believe I can learn to use AI in more sophisticated ways with practice') extends Dweck's general intelligence items to the specific domain of AI interaction.

Sources

Dweck, C.S. (2006). Mindset: The New Psychology of Success. Random House
Yeager, D.S. & Dweck, C.S. (2012). Mindsets that promote resilience: When students believe that personal characteristics can be developed. Educational Psychologist, 47(4)

Scoring and Alignment

Alignment-based scoring allows multiple paths to a perfect score depending on task type.

Traditional engagement metrics assign higher scores to 'better' behaviors, implicitly assuming one pattern fits all. Our alignment-based scoring system instead measures the distance between a user's actual mode distribution and the task-appropriate target distribution. This design is grounded in person-environment fit theory (Edwards, 2008), which demonstrates that outcomes depend not on absolute levels of a trait but on the match between the person and the situation. A conversation that's 40% Production Assistant might score poorly for a learning task but perfectly for a code generation task. The scoring formula (100 minus weighted distance from target) ensures that multiple mode distributions can achieve high scores — there's no single 'right answer,' only better or worse fits for the task at hand. This prevents the perverse incentive of gaming a fixed scoring rubric.

Sources

Edwards, J.R. (2008). Person-environment fit in organizations: An assessment of theoretical progress. Academy of Management Annals, 2(1)

Survey Methodology

Survey items use established psychometric scales with reverse-coding, attention checks, and data quality scoring.

The survey methodology follows established psychometric best practices. Need for Cognition items are adapted from Cacioppo and Petty's (1982) 18-item scale. Growth Mindset items derive from Dweck's (1999) Implicit Theories of Intelligence scale. Reverse-coded items are included in each construct to detect acquiescence bias — the tendency to agree regardless of content. Attention check items (Meade & Craig, 2012) filter responses from participants who are not reading carefully, as careless responding can account for 10-15% of survey data and significantly distort results. Our data quality scoring system penalizes straight-line responding, suspiciously fast completion times, and failed attention checks. When surveys are retaken, scores are computed using exponential decay weighted averaging (weight = 0.5^n for the nth-oldest response), giving more influence to recent measurements while preserving historical context.

Sources

Cacioppo, J.T. & Petty, R.E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42(1)
Dweck, C.S. (1999). Self-theories: Their role in motivation, personality, and development. Psychology Press
Meade, A.W. & Craig, S.B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3)

Feedback Design

Feedback is designed to support autonomy rather than control, based on Self-Determination Theory.

The feedback design follows principles from Self-Determination Theory (Deci & Ryan, 2000), which distinguishes between autonomous motivation (doing something because you find it valuable) and controlled motivation (doing something because someone told you to). Research consistently shows that autonomy-supportive feedback — which explains why something matters and offers choice — produces more durable behavior change than controlling feedback that simply prescribes actions. Our feedback is framed to help users understand their patterns and decide what to change, rather than telling them what to do. Hattie and Timperley's (2007) feedback model further informs the design: effective feedback answers three questions — Where am I going? (the recommended ranges), How am I doing? (the mode distribution), and Where to next? (the action plans). The scoring system deliberately avoids punitive framing — scores represent alignment with task-appropriate behavior, not grades on a fixed rubric.

Sources

Your feedback is personalized based on what works best for users with similar engagement profiles.

The feedback optimization system tracks a simple but powerful metric: when we give you advice, do you actually follow it? Each time you submit a conversation, the system checks whether you improved in the areas your previous feedback recommended. Over time, this creates a dataset of what works for different types of users. The system uses two layers of optimization. First, it groups users by engagement archetype (Delegator, Partner, Verifier, etc.) and finds which feedback style produces the highest follow-through rate for each group. Second, it trains a statistical model that considers your full personality profile — your openness, need for cognition, experience level — to predict which feedback approach will resonate with you specifically. For example, users with high Need for Cognition tend to respond better to detailed analytical feedback, while users early in their journey benefit more from specific actionable instructions. This approach is grounded in Hattie and Timperley's (2007) feedback model, which shows that effective feedback must be calibrated to the learner's current level. Pane et al. (2017) demonstrated that adaptive personalization in educational technology produces measurable learning gains. The system improves as more students use it — every resolved feedback outcome makes the next recommendation more accurate.

Limitations

The optimization model is bootstrapped with synthetic data and improves as real outcomes accumulate. Early recommendations may not outperform random selection until sufficient data is collected. The system optimizes for follow-through rate, which is a proxy for — but not identical to — actual learning improvement.

Sources

Full Reference List

All 29 sources cited across this page, in alphabetical order.

Anderson, L.W. & Krathwohl, D.R. (2001). A Taxonomy for Learning, Teaching, and Assessing. Longman
Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, O. & Mariman, R. (2024). Generative AI can harm learning. Working Paper, Wharton School
Cacioppo, J.T. & Petty, R.E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42(1)
Chi, M.T.H. (2014). Two kinds and four sub-types of passive, active, constructive, and interactive. Topics in Cognitive Science, 6(1)
Costa, P.T. & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources
Deci, E.L. & Ryan, R.M. (2000). The 'what' and 'why' of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4)
Doshi, A.R. & Hauser, O.P. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances, 10(28)
Dweck, C.S. (1999). Self-theories: Their role in motivation, personality, and development. Psychology Press
Dweck, C.S. (2006). Mindset: The New Psychology of Success. Random House
Edwards, J.R. (2008). Person-environment fit in organizations: An assessment of theoretical progress. Academy of Management Annals, 2(1)
Flavell, J.H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34(10)
Goddard, K., Roudsari, A. & Wyatt, J.C. (2012). Automation bias: A systematic review. Journal of the American Medical Informatics Association, 19(1)
Guilford, J.P. (1967). The Nature of Human Intelligence. McGraw-Hill
Hattie, J. & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1)
Hutchins, E. (1995). Cognition in the Wild. MIT Press
McCrae, R.R. & Costa, P.T. (2008). The five-factor theory of personality. In O.P. John et al. (Eds.), Handbook of Personality, 3rd ed., Guilford Press
Meade, A.W. & Craig, S.B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3)
Mercier, H. & Sperber, D. (2011). Why do humans reason? Arguments for an argumentative theory. Behavioral and Brain Sciences, 34(2)
Pane, J.F., Steiner, E.D., Baird, M.D. & Hamilton, L.S. (2017). Informing Progress: Insights on Personalized Learning. RAND Corporation
Parasuraman, R. & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2)
Risko, E.F. & Gilbert, S.J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9)
Schraagen, J.M., Chipman, S.F. & Shalin, V.L. (2000). Cognitive Task Analysis. Lawrence Erlbaum Associates
Schraw, G. & Dennison, R.S. (1994). Assessing metacognitive awareness. Contemporary Educational Psychology, 19(4)
Sparrow, B., Liu, J. & Wegner, D.M. (2011). Google effects on memory: Cognitive consequences of having information at our fingertips. Science, 333(6043)
Sperber, D. et al. (2010). Epistemic vigilance. Mind & Language, 25(4)
Vygotsky, L.S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press
Wood, D., Bruner, J.S. & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2)
Yeager, D.S. & Dweck, C.S. (2012). Mindsets that promote resilience: When students believe that personal characteristics can be developed. Educational Psychologist, 47(4)
Zimmerman, B.J. (2002). Becoming a self-regulated learner: An overview. Theory Into Practice, 41(2)

This research foundations page is a living document. As the AI Engagement Analysis tool evolves, we add new evidence and update existing citations. If you are a researcher interested in collaborating or have suggestions for additional evidence, please contact us.

For a detailed walkthrough of how the grading rubrics are derived from this theoretical foundation, see Rubric Derivation Methodology.

Back to AI Engagement Analysis