Rubric Derivation Methodology
How the grading rubrics used in aimodes.ai are derived, validated, and revised.
This document is the theoretical anchor for every task-type rubric in the system. The rubrics are not instructor intuition cast as numbers — they are falsifiable theoretical claims derived from cognitive science, education research, and the AI Engagement Typology (AIT). Each rubric's per-mode target proportion and recommended mode path can be traced back through this methodology to a specific theoretical justification.
Primary author: Mark Keith (BYU) with David Wood and Clay Posey. Version: 0.1 (draft, 2026-04-22) Status: Methodology sections (1–3, 5, 6) drafted. Section 4 (per-task derivations) pending author review of methodology before proceeding. Downstream uses: grading spec (
/6-Apps/aimodes-grading-spec/), MyEducator integration, the research program, aimodes.ai public/research/rubric-theorypage.
Executive summary
The AI Engagement Typology classifies user-AI interactions into eight modes grouped across three epistemic tiers (Passivity, Partnership, Agency). Each task a student completes in aimodes.ai is graded against a target mode distribution — the proportion of the conversation that should be spent in each mode — and a recommended mode path — the order in which modes should first appear.
The central methodological claim of this document is that target distributions and mode paths are not arbitrary. They are derived from a four-step procedure combining:
- Task decomposition — breaking the task into its constituent cognitive operations (Anderson & Krathwohl, 2001);
- Mode mapping — matching each cognitive operation to the mode whose behavioral signature serves that operation;
- Cognitive-load weighting — proportioning modes by the relative effort and time each operation demands (Sweller, 1988);
- Expertise calibration — adjusting the distribution by the student's expertise level using scaffolding theory (Vygotsky, 1978) and the expertise reversal effect (Kalyuga, Ayres, Chandler, & Sweller, 2003).
The resulting rubrics are theory-driven in v0.1, with no empirical weighting from student data. Section 6 describes the ML pipeline that will convert them into empirically-calibrated rubrics once sufficient data is collected — moving from theoretical_v1 to empirical_v1 over roughly two semesters of operation.
1. Theoretical foundation
1.1 Why modes are cognitive operations, not behaviors
The eight AIT modes are not behaviors in the operant sense. Each mode specifies a cognitive relationship between user and AI — who is holding the reasoning, who is holding the knowledge, who is holding the evaluation — and the user's observable messages are downstream evidence of that relationship. Treating modes as cognitive operations rather than surface behaviors matters for rubric derivation: a "target distribution" is a claim about what cognitive work the task requires, not what words the student should type.
Theoretical grounding here rests on three traditions:
- Hammond's Cognitive Continuum Theory (CCT) (Hammond, 1988, 1996). Cognition operates on a continuum from intuitive (fast, pattern-based, low-effort) to analytic (deliberate, rule-based, high-effort). AIT modes differ systematically in where they position the user on this continuum: Oracle and Production Assistant anchor the intuitive pole; Verification, Critical Challenger, and Problem Setter anchor the analytic pole; Tutor and Collaborative Problem-Solver are quasi-rational middle positions that can drift either way depending on execution.
- Dual-process theories of cognition (Evans & Stanovich, 2013; Kahneman, 2011). System 1 and System 2 thinking correspond approximately to the Passivity and Agency tiers. Partnership tier engagement recruits both systems, which is why it is neither always better nor always worse than either pole — it is appropriate for tasks where the student is actively learning structure they do not yet possess.
- Bloom's revised taxonomy (Anderson & Krathwohl, 2001). Each mode maps to a Bloom's level, summarized in
AI_ENGAGEMENT_MODEL.md. The mapping is not bijective — some modes span two levels — but it provides the cognitive-operation vocabulary that step 1 of the derivation procedure (task decomposition) uses.
1.2 Why tier structure matters for rubrics
The three tiers are not a hierarchy. "More Agency" is not "better." The framework measures fit-to-task, not elevation. A factual lookup is best served by Oracle; a concept you already understand is best served by Verification; a poorly-posed research question is best served by Problem Setter. The target distribution for a task is the theoretically-correct mix for that task, not a student's developmental target.
This is the core reason a task rubric cannot be derived by asking "what's the best way to use AI?" It must be derived by asking "what cognitive operations does this task require, and which modes serve them?"
1.3 Why expertise shifts the distribution
Expertise is the single largest moderator of what mode-mix best serves a given task. Three traditions support this:
- Chase & Simon (1973) and later work on expertise (Ericsson, Krampe, & Tesch-Römer, 1993; Ericsson, 2006) show that experts have chunked, structured knowledge in their domain that novices lack. Novices learning the same material benefit from scaffolded, explanation-heavy engagement (Tutor); experts derive little from the same engagement and benefit instead from challenge and reframing (Critical Challenger, Problem Setter).
- Dreyfus & Dreyfus (1980) five-stage model of skill acquisition predicts qualitatively different interaction patterns at each stage. Novices operate on explicit rules; experts operate on situational pattern recognition. The rubric framework compresses Dreyfus's five stages to three tiers (Beginner / Intermediate / Expert) for tractability.
- Expertise reversal effect (Kalyuga, Ayres, Chandler, & Sweller, 2003; Kalyuga, 2007) demonstrates that instructional techniques optimal for novices actively harm expert learners, and vice versa. Applied to AI engagement: a heavy Tutor/Oracle pattern that supports a novice's understanding may undermine an expert's analytical processing by offloading cognitive work the expert needs to do themselves to stay calibrated.
The practical implication: every target distribution must be tier-adjusted. A single universal distribution per task is theoretically indefensible.
2. Derivation methodology
The procedure for deriving a task-type rubric has five steps. This procedure will be applied to all 26 task types in Section 4; the output is a structured rubric file conforming to the schema in aimodes-grading-spec/task-types/README.md.
Step 1 — Decompose the task into cognitive operations
Using Bloom's revised taxonomy as the operation vocabulary, list every cognitive operation the task requires, at a grain size where each operation is served by 1–2 modes. Operations are verbs: remember, understand, apply, analyze, evaluate, create, plus meta-cognitive operations (reframe, verify, expand).
Example (task: "Research and fact-check a historical claim"):
- Understand the claim's context and scope (Bloom: Understand)
- Locate sources (Bloom: Remember/Apply)
- Evaluate source credibility (Bloom: Evaluate)
- Integrate evidence into a position (Bloom: Analyze)
- Stress-test the position against counter-evidence (Bloom: Evaluate)
- Reframe if evidence contradicts (Bloom: Create, meta-level)
Step 2 — Map operations to modes
For each operation, identify the mode(s) whose behavioral signature (per the canonical classification guide) serves that operation. Most operations map to one dominant mode and one secondary mode.
Example (continuing):
- Understand → Tutor (3), Oracle (1) secondary
- Locate sources → Oracle (1), Collaborative (4) secondary
- Evaluate credibility → Verification (5), Critical Challenger (7) secondary
- Integrate evidence → Collaborative (4), Creative Expander (6) secondary
- Stress-test → Critical Challenger (7)
- Reframe → Problem Setter (8)
Step 3 — Weight by cognitive load and time
Each operation has a relative cognitive load (how much thinking it demands) and a relative time cost (how long it takes in the conversation). These two dimensions combine into a weight. Cognitive load is estimated from Cognitive Load Theory (Sweller, 1988, 1994) using the intrinsic / extraneous / germane tripartite framework (Sweller, van Merriënboer, & Paas, 1998) — intrinsic load (inherent to the task), extraneous load (imposed by presentation), and germane load (schema construction). Time cost is estimated from task-family averages observed in pilot data and adjusted downward when instructional scaffolding compresses time.
The weight w_i for operation i is (load_i × time_i) / Σ(load_k × time_k). The base target distribution for mode m is the sum of w_i over all operations whose dominant mode is m, plus half the weight for operations whose secondary mode is m.
This is the most contested step in the derivation. Estimates of cognitive load and time are not directly observable; they are expert judgments informed by the literature and pilot data. Section 5 (validation) and Section 6 (ML pipeline) are how these estimates are checked and revised.
Step 4 — Apply expertise calibration
The base distribution from Step 3 is adjusted by tier-specific multipliers. The current multipliers in src/lib/engagement/theoretical-distributions.ts are stated below for reference; Section 3 derives them.
| Mode | Beginner | Intermediate | Expert |
|---|---|---|---|
| 1 — Oracle | 1.4 | 0.8 | 0.3 |
| 2 — Production | 1.2 | 1.0 | 0.6 |
| 3 — Tutor | 1.5 | 1.0 | 0.5 |
| 4 — Collaborative | 1.0 | 1.2 | 1.0 |
| 5 — Verification | 0.7 | 1.1 | 1.4 |
| 6 — Creative | 0.6 | 1.2 | 1.3 |
| 7 — Challenge | 0.4 | 1.0 | 1.6 |
| 8 — Problem Setter | 0.3 | 0.9 | 1.8 |
The tier-adjusted distribution is base_m × multiplier_m, normalized to sum to 1.0. The mode path is also tier-adjusted: Beginners get Tutor (3) prepended if not already first; Experts get Problem Setter (8) prepended if not already first.
Step 5 — Normalize and sanity-check
The tier-adjusted distribution is normalized to sum to 1.0. Three sanity checks are applied before the rubric is finalized:
- Floor check. No mode's proportion is below 0.02 unless the operation set for the task genuinely excludes that mode. A proportion below 0.02 signals the mode is absent rather than rare, and should be represented as 0 explicitly.
- Ceiling check. No mode exceeds 0.50 unless the task is single-mode by design (e.g., a drill that is intentionally Tutor-heavy). A ceiling violation usually means the operation set is too narrow.
- Tier sanity. The Passivity / Partnership / Agency tier proportions are computed. For a Beginner version of any non-drill task, Partnership tier should be ≥ 0.40; for an Expert version, Agency tier should be ≥ 0.50. Violations are evidence the multipliers haven't fully propagated.
3. Expertise calibration framework
3.1 Why three levels, not five
Dreyfus & Dreyfus (1980) propose five stages: novice, advanced beginner, competent, proficient, expert. Our framework compresses this to three. The compression is justified by two considerations:
- Measurement granularity. With currently-available data (N=91 pilot, N=331 Wave 1 survey), five levels produce sparse cells and unstable tier-specific distributions. Three levels produce distributions that are estimable with current sample sizes.
- Pedagogical interpretability. Instructors and students can meaningfully self-assess into three levels with relatively high agreement; five levels produce much more measurement error at the boundaries. This matters because current aimodes assignments ask students to self-report their expertise level per task.
A future version of the framework (v2.0) may move to five levels once Wave 2 and Study 2+3 data collection enables it. That transition would require re-deriving every task rubric because tier boundaries shift.
3.2 Why these multipliers
The multipliers in the table in Step 4 are derived from four constraints:
-
Monotonicity constraint. Within each mode, the multiplier moves monotonically across tiers in a direction consistent with the cognitive-relationship theory. Passivity modes (1, 2) decrease with expertise; Agency modes (5–8) increase. Partnership modes (3, 4) do not follow a strict monotone: Tutor (3) declines with expertise (experts need less explanation); Collaborative (4) stays approximately flat because collaboration is useful across levels, differing only in who contributes what.
-
Tier-sum constraint. Applied to a task whose base distribution has 30% Passivity, 40% Partnership, 30% Agency, the Beginner multipliers produce approximately 42% Passivity, 46% Partnership, 22% Agency (pulling toward Passivity + Partnership). The Expert multipliers produce approximately 10% Passivity, 31% Partnership, 75% Agency before normalization — normalization brings this to 9% / 27% / 64%. This is theoretically consistent with the expertise reversal literature (Kalyuga et al., 2003).
-
Zero-avoidance constraint. No multiplier is zero; the minimum is 0.3 (Problem Setter for Beginners). This preserves the possibility that a Beginner could use Problem Setter productively — the framework does not forbid any mode at any level, it only specifies what's typical.
-
Anchor-matching constraint. The multipliers were set so that applying them to a baseline task where the target is uniform (0.125 per mode) produces approximately the Wave 1 observed distributions by self-reported expertise level (N=331). This is a weak constraint — Wave 1 data is a self-report survey, not behavioral data — but it keeps the multipliers within plausibility.
3.3 Limitations acknowledged in v0.1
- Self-report expertise is noisy. Students' self-reported tier is only approximately correct. Future versions will infer tier from conversation behavior using the
predictor.tsmodule, with the self-report as a prior. - Task-independent multipliers are a simplification. Some tasks may warrant task-specific multipliers (e.g., creative writing may shift less with expertise than mathematical proof construction does). v0.1 assumes task-independent multipliers; Section 6 (ML pipeline) is how we test that assumption.
- Three-level compression loses information. Students on boundary between two levels are miscalibrated by construction. The model is robust to this in aggregate but biased for individuals.
- Path rules are theory-driven, not yet validated. §3.4 specifies tier-specific path transformations (anchor prepend, side-cycle interleave, tail collapse). The transformations are theoretically motivated but not yet tested against observed student paths at scale. Wave 2 / Study 2+3 path data will validate or revise.
3.4 Path derivation by tier
The base mode path captures the canonical sequence of cognitive operations a task requires. Distributions tell us how often each mode appears; paths tell us in what order. Two students who land on identical mode distributions but in different sequences are doing different work — order encodes dependency between operations. Tier-adjusted paths therefore need their own derivation rules, not just inherited multipliers from the distribution side. Three rules apply at the path level.
Rule 1 — Tier-anchor prepend. A Beginner path prepends Tutor (Mode 3) at position 0 if not already first. An Expert path prepends Problem Setter (Mode 8) at position 0 if not already first. Intermediate paths inherit the base path unchanged.
The asymmetry has theoretical grounding. Beginners lack the schemas needed to act productively in the task domain (Sweller, 1994); Tutor at the start delivers the missing instructional scaffolding before any other operation can succeed. Experts already hold the schemas; for them, the binding constraint is whether the framing of the task is correct. Problem Setter at the start lets them interrogate the frame before committing effort. Reversing the prepend (Tutor for Experts, Problem Setter for Beginners) would harm both — the expertise-reversal effect (Kalyuga et al., 2003) for Experts, and meta-cognitive overload of premature framing for Beginners.
Rule 2 — Tier-characteristic side-cycles. Beginners interleave Tutor as a recurring side-cycle throughout the path, not only at position 0. After every two consecutive non-Tutor operations, the Beginner path returns to Mode 3 to consolidate before continuing. The grounding is cognitive load theory (Sweller, 1988): Beginners spend working memory encoding task structure itself, leaving little capacity for new content unless it is periodically chunked. Schnotz & Kürschner (2007) describe this as the germane-load offload pattern. A Beginner path of [3, 8, 6, 4, 7, 2, 5] therefore renders in practice as [3, 8, 6, 3, 4, 7, 3, 2, 5] — Tutor as connective tissue, not just opening move.
Experts have an analogous side-cycle in Verification (Mode 5). After every two consecutive non-Verification operations, the Expert path returns to Mode 5. Expert paths do not end with verification; they verify continuously, in tight loops with each generative step. This pattern is already noted inline in §4.10 (research) and §4.6 (exam-prep) but generalizes: an Expert path of [8, 6, 4, 7, 2, 5] is more accurately [8, 6, 5, 4, 7, 5, 2, 5]. Intermediates have no characteristic side-cycle; they execute the base path with one anchor prepend (if needed) and minimal interruption.
Side-cycle suppression rules. Two suppressions keep the rule from over-firing: (a) no insertion at the final position of the path (a side-cycle after the last operation has no consolidating effect), and (b) no insertion when the next operation in the base path is already the side-cycle mode (the consolidation will happen anyway). The counter resets to zero whenever the side-cycle mode appears, whether inserted or native.
Rule 3 — Beginner cycle collapse. When the base path ends with a repeated cycle of length k appearing two or more times consecutively (e.g., A → B → A → B), the Beginner path truncates to a single instance of that cycle (A → B). Beginners are still consolidating the first iteration when the task time-box closes; a second iteration is unrealistic under the working-memory budget that defines the tier. Intermediate and Expert paths preserve the repetition. This rule generates the truncated tier paths visible in §4.5 (study), §4.6 (exam-prep), and §4.10 (research).
Empty-path tasks. All three rules are suppressed when the base path is empty. Baseline (§4.1), fitness-check (§4.12), and the two catch-all tasks (§4.25, §4.26) are deliberately free-form; prescribing a tier-specific sequence on top of an empty base would manufacture structure that doesn't exist in the task design.
Application order. When a task's base path is non-empty: (1) collapse repeated tail cycles for Beginner only, (2) prepend the tier anchor (3 for Beginner, 8 for Expert) if not already first, (3) interleave the tier side-cycle (3 for Beginner, 5 for Expert) after every two consecutive non-side-cycle operations. Intermediate paths apply only step 2 implicitly (no anchor needed because Intermediate inherits the base unchanged).
Validation implications. Path rules are checked alongside distribution rules. Specifically: any non-empty Beginner path must start with Mode 3, must contain at least one additional Mode 3 entry at position ≥ 2 if the post-collapse path length is ≥ 4, and must not contain a tail repetition. Any non-empty Expert path must start with Mode 8 and should contain Mode 5 at multiple positions if the path length is ≥ 4. The spec package's task-types/README.md codifies these as machine-checkable rules.
4. Per-task derivations
This section derives the target mode distribution, base mode path, and tier-adjusted variants for each of the 26 task types defined in src/lib/engagement/task-types.ts. Each derivation follows the five-step procedure from Section 2 and applies the expertise multipliers from Section 3.
4.0 How to read these tables
Each derivation presents:
- Task description — a one-sentence characterization of what the student is doing.
- Cognitive operations — the discrete operations the task requires, mapped to Bloom's revised taxonomy.
- Mode mapping and base distribution — the proportion of the task-relevant cognitive work served by each of the 8 modes, summing to 1.00.
- Tier-adjusted distributions — the base distribution multiplied by the Section 3 expertise multipliers and renormalized. Values shown in the per-task tables are rounded to the nearest whole percent for readability; per-row sums may differ from 100% by ≤ 2 percentage points due to rounding. The precise normalized values (sum exactly 1.000) live in
aimodes-grading-spec/task-types/*.yaml— that spec package is the authoritative numerical source for downstream implementers. Methodology tables here are the human-readable presentation. - Tier-adjusted mode paths — the recommended sequence of modes per the path derivation rules in §3.4. Beginner paths prepend Tutor (Mode 3), interleave Tutor side-cycles after every two non-Tutor operations, and collapse repeated tail cycles. Expert paths prepend Problem Setter (Mode 8) and interleave Verification (Mode 5) side-cycles after every two non-Verification operations. Intermediate paths inherit the base path with no modification. Paths shown in the per-task tables are post-transformation; the base path is shown only on the Base row.
Distribution notation in text is {M1, M2, M3, M4, M5, M6, M7, M8}. Mode paths are written A → B → C.
4.1 task-baseline — Baseline Assessment (Book 1, Ch 1)
Task. The student has a free-form conversation with AI on any topic they choose, so the framework can measure their natural engagement pattern before any instruction.
Cognitive operations. None prescribed — the measurement goal is diagnostic. Students bring whatever operations they would bring outside the course.
Base distribution and rationale. Because this task measures natural behavior rather than prescribes it, the base target mirrors the modal pattern we would expect from a population of students approximating untrained novices. Drawing on Wave 1 pilot data and the broader literature on novice AI use (cognitive offloading, Risko & Gilbert, 2016), Passivity modes dominate: {.30, .20, .15, .10, .05, .10, .05, .05}. The student is not graded against this target — the baseline is used only as a reference pattern for comparison across subsequent tasks.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 30 | 20 | 15 | 10 | 5 | 10 | 5 | 5 | (none) |
| Beginner | 38 | 22 | 20 | 9 | 3 | 5 | 2 | 1 | (none) |
| Intermediate | 25 | 20 | 15 | 12 | 6 | 12 | 5 | 5 | (none) |
| Expert | 12 | 16 | 10 | 13 | 9 | 17 | 11 | 12 | (none) |
Notes. Because baseline has no prescribed path, the path-adjustment rules (prepend Tutor for Beginner, Problem Setter for Expert) do not apply. The table is still shown to illustrate what pattern a student of that tier would be expected to produce naturally, for comparison to their actual submission. Scoring for the baseline task is pass/fail on completion; the tier-adjusted distributions are diagnostic, not evaluative.
4.2 task-how-ai-works — Understanding AI (Book 1, Ch 2)
Task. The student runs a calibration test to identify where AI output is reliable versus unreliable, producing a short reflection on what they learned.
Cognitive operations.
- Understand a claim or explanation AI provides (Bloom: Understand)
- Apply a test or verification procedure to that claim (Apply, Evaluate)
- Identify discrepancy or error (Evaluate)
- Synthesize a personal heuristic for when to trust AI output (Analyze, Create)
Base distribution and rationale. This is a Tutor-plus-Verification task. The student must understand what AI is claiming before they can evaluate it; verification dominates once the claim is understood. Minor Collaborative (Mode 4) work happens when the student reasons together with AI about edge cases. Problem Setter is not useful here — the frame is given ("test AI's reliability"), not to be questioned. Base: {.15, .05, .30, .15, .25, .05, .03, .02}. Path: 3 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 15 | 5 | 30 | 15 | 25 | 5 | 3 | 2 | 3 → 5 |
| Beginner | 19 | 6 | 41 | 14 | 16 | 3 | 1 | 0 | 3 → 5 |
| Intermediate | 12 | 5 | 29 | 17 | 27 | 6 | 3 | 2 | 3 → 5 |
| Expert | 5 | 3 | 17 | 17 | 40 | 7 | 6 | 4 | 8 → 3 → 5 |
Notes. Expert path prepends Problem Setter even though the base task gives the frame, because expert students should be interrogating which calibration claims are worth testing before spending time testing them. This is the expertise reversal effect applied to meta-cognition (Kalyuga, 2007).
4.3 task-quick-answers — Oracle Audit (Book 1, Ch 3)
Task. The student completes two rounds: Round 1 is natural Oracle-heavy behavior; Round 2 applies an "Oracle Filter" (three-question check before accepting any AI answer). Students produce both transcripts plus a comparison reflection.
Cognitive operations.
- Ask for factual information (Remember)
- Evaluate whether the answer is plausible (Evaluate)
- Verify against another source or own reasoning (Evaluate)
- Reflect on the shift between rounds (Analyze)
Base distribution and rationale. The task is intentionally split-personality: Round 1 expects heavy Oracle, Round 2 expects heavy Verification. The aggregate base distribution balances across both rounds, with Verification (M5) dominant because the learning happens in Round 2. Light Collaborative (M4) and Critical Challenger (M7) capture students who go beyond the minimum to push back on AI answers. Base: {.15, .10, .15, .15, .25, .05, .10, .05}. Path: 1 → 5 (reflecting the round-to-round shift).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 15 | 10 | 15 | 15 | 25 | 5 | 10 | 5 | 1 → 5 |
| Beginner | 22 | 12 | 23 | 16 | 18 | 3 | 4 | 2 | 3 → 1 → 5 |
| Intermediate | 12 | 10 | 15 | 18 | 27 | 6 | 10 | 4 | 1 → 5 |
| Expert | 4 | 6 | 8 | 15 | 35 | 6 | 16 | 9 | 8 → 1 → 5 |
Notes. Beginner path prepends Tutor because a novice benefits from understanding why the Oracle Filter works before applying it. Expert path prepends Problem Setter because experts should question whether the filter's three canonical questions are even the right ones for their domain.
4.4 task-learning — Concept Deep Dive (Book 1, Ch 4)
Task. The student picks a concept they don't understand and uses AI to build understanding through Socratic dialogue, then produces a one-paragraph teach-back written from memory.
Cognitive operations.
- Frame what is unknown about the concept (Analyze, meta-level)
- Receive scaffolded explanation (Understand)
- Restate concept in own words (Understand, Apply)
- Self-test at multiple Bloom's levels (Evaluate)
- Challenge own first interpretation (Evaluate)
Base distribution and rationale. Tutor (M3) dominates because the core work is learning. Collaborative (M4) is heavy because the student must contribute reasoning to avoid passive reception. Verification (M5) is substantial because self-testing is required. Problem Setter (M8) appears modestly because the student is asked to question their framing of the concept before asking AI. Base: {.10, .02, .35, .25, .20, .03, .02, .03}. Path: 8 → 3 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 10 | 2 | 35 | 25 | 20 | 3 | 2 | 3 | 8 → 3 → 7 → 5 |
| Beginner | 13 | 2 | 47 | 22 | 13 | 2 | 1 | 1 | 3 → 8 → 3 → 7 → 5 |
| Intermediate | 8 | 2 | 33 | 28 | 21 | 3 | 2 | 3 | 8 → 3 → 7 → 5 |
| Expert | 3 | 1 | 20 | 29 | 32 | 4 | 4 | 6 | 8 → 3 → 5 → 7 → 5 |
Notes. Beginner path places Tutor first (default for beginners); Problem Setter becomes accessible only after initial understanding. Expert path is unchanged because Problem Setter was already first. This rubric is re-derivation of the existing concept-deep-dive file in the codebase; the base vector matches.
4.5 task-studying — Study Sprint (Book 1, Ch 5)
Task. Across three spaced sessions, the student uses AI for retrieval practice, spaced review, and interleaving, producing three transcripts plus a memory-change reflection.
Cognitive operations.
- Attempt recall before receiving information (Remember)
- Check accuracy of recall (Evaluate)
- Generate practice problems for self-study (Apply — Tutor per canonical definition)
- Interleave concepts across sessions (Analyze)
- Notice and describe forgetting (Evaluate, meta-level)
Base distribution and rationale. Tutor (M3) dominates because retrieval and practice-problem generation both classify as Tutor under the canonical rules. Verification (M5) is substantial because students check their own answers. Problem Setter (M8) is non-trivial because the student must interrogate what they need to study. Minor Production Assistant (M2) for formatting flashcards. Base: {.10, .08, .40, .07, .20, .02, .02, .11}. Path: 3 → 5 → 3 → 5 (retrieval-verification cycles across sessions).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 10 | 8 | 40 | 7 | 20 | 2 | 2 | 11 | 3 → 5 → 3 → 5 |
| Beginner | 13 | 9 | 55 | 6 | 13 | 1 | 1 | 3 | 3 → 5 |
| Intermediate | 8 | 8 | 40 | 8 | 22 | 2 | 2 | 10 | 3 → 5 → 3 → 5 |
| Expert | 3 | 5 | 23 | 8 | 32 | 3 | 4 | 22 | 8 → 3 → 5 → 3 → 5 |
Notes. Paths with repeated elements are collapsed for tier display (the underlying repetition is preserved in the base path). Expert path prepends Problem Setter because experts should question their knowledge gaps before generating practice material. Re-derivation of existing study-sprint rubric; base vector matches.
4.6 task-exam-prep — Exam Prep Simulation (Book 1, Ch 6)
Task. The student runs a four-stage exam-prep workflow: (1) map territory + self-assess, (2) practice exam, (3) reasoning review, (4) stress test with Critical Challenger. Plus a reflection comparing stress-test insights to practice-exam insights.
Cognitive operations.
- Self-assess confidence on topics (Evaluate, meta-level)
- Receive scaffolded practice problems (Apply — Tutor)
- Attempt problems without answers (Apply)
- Review reasoning on right and wrong answers (Evaluate)
- Invite adversarial questions targeting weakest areas (Evaluate)
Base distribution and rationale. Tutor (M3) is dominant for the practice and review stages; Critical Challenger (M7) is heavily weighted for the stress-test stage — this is what distinguishes this task from plain studying. Verification (M5) captures the reasoning-review stage. Low Production Assistant because exam prep should be student-generated reasoning, not AI-generated answers. Base: {.05, .03, .30, .07, .20, .03, .25, .07}. Path: 3 → 5 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 3 | 30 | 7 | 20 | 3 | 25 | 7 | 3 → 5 → 7 → 5 |
| Beginner | 8 | 4 | 50 | 8 | 16 | 2 | 11 | 2 | 3 → 5 → 7 → 3 → 5 |
| Intermediate | 4 | 3 | 29 | 8 | 22 | 4 | 24 | 6 | 3 → 5 → 7 → 5 |
| Expert | 1 | 2 | 14 | 6 | 26 | 4 | 36 | 12 | 8 → 3 → 5 → 7 → 5 |
Notes. Expert path prepends Problem Setter because experts should interrogate what kind of exam this is before deciding what stress test is appropriate. Re-derivation of existing exam-prep-simulation rubric.
4.7 task-writing — The AI Writing Partner (Book 1, Ch 7)
Task. Write an analytical essay using AI across the full writing workflow: understand the assignment, build the argument, draft with ownership boundaries, critically challenge the draft, and verify citations.
Cognitive operations.
- Interrogate the assignment frame (Analyze, meta-level)
- Construct an argument (Create)
- Draft prose paragraphs (Apply, Create)
- Revise under adversarial challenge (Evaluate)
- Verify factual claims and citations (Evaluate)
Base distribution and rationale. Writing is the most mode-diverse task in the book — this is why it is the canonical mode-fluency task. Production Assistant (M2) is substantial because students are supposed to use AI for craft-level revision within boundaries, but it is not dominant. Collaborative (M4) and Critical Challenger (M7) carry equal weight because argument-building and revision are co-dominant phases. Creative Expander (M6) supports divergent thinking about approaches. Verification (M5) is essential given AI's hallucination risk on citations. Problem Setter (M8) is non-trivial because the assignment-framing step is load-bearing. Base: {.05, .15, .05, .20, .15, .15, .15, .10}. Path: 8 → 6 → 4 → 7 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 15 | 5 | 20 | 15 | 15 | 15 | 10 | 8 → 6 → 4 → 7 → 2 → 5 |
| Beginner | 9 | 22 | 9 | 25 | 13 | 11 | 7 | 4 | 3 → 8 → 6 → 3 → 4 → 7 → 3 → 2 → 5 |
| Intermediate | 4 | 14 | 5 | 22 | 16 | 17 | 14 | 8 | 8 → 6 → 4 → 7 → 2 → 5 |
| Expert | 1 | 8 | 2 | 17 | 18 | 17 | 21 | 16 | 8 → 6 → 5 → 4 → 7 → 5 → 2 → 5 |
Sum note: Expert row sums to 129% before normalization in illustration; the displayed values are post-normalization rounded to whole percents.
Notes. Beginner path prepends Tutor because a first-time essay writer needs to understand the assignment type (synthesis vs. evaluation vs. application) before framing can even happen. Expert distribution heavily shifts away from Production Assistant — expert writers write their own prose and use AI for challenge, not generation. This supports the Five Principles section of the book chapter (writing ownership spectrum).
4.8 task-problem-solving — Problem Sets (Book 1, Ch 8)
Task. The student works through 3+ analytical problems with AI as thinking partner, identifies patterns across problems, and articulates the underlying principle.
Cognitive operations.
- Attempt problem before asking AI (Apply)
- Receive guided discovery hints, not answers (Understand, Apply)
- Iterate reasoning collaboratively (Analyze)
- Verify solution through alternative method (Evaluate)
- Identify generalizable principle across problems (Analyze, Create)
Base distribution and rationale. Collaborative Problem-Solver (M4) is dominant because the task is explicitly "think with AI" rather than "get answer from AI." Tutor (M3) is substantial because guided-discovery interactions classify as Tutor. Verification (M5) is essential to check reasoning, not just answers. Low Production because problem work should show the student's reasoning, not AI's solution. Base: {.08, .05, .25, .30, .20, .02, .05, .05}. Path: 8 → 4 → 5 → 7.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 5 | 25 | 30 | 20 | 2 | 5 | 5 | 8 → 4 → 5 → 7 |
| Beginner | 11 | 6 | 36 | 29 | 14 | 1 | 2 | 2 | 3 → 8 → 4 → 3 → 5 → 7 |
| Intermediate | 6 | 5 | 24 | 34 | 21 | 2 | 5 | 4 | 8 → 4 → 5 → 7 |
| Expert | 2 | 3 | 13 | 32 | 29 | 3 | 8 | 9 | 8 → 4 → 5 → 7 |
Notes. Beginner path correctly prepends Tutor because a student who does not yet have the problem-type schemas needs instruction before guided discovery works (expertise reversal — Kalyuga et al., 2003).
4.9 task-brainstorming — Reframing & Ideation (Book 1, Ch 9)
Task. Starting from a vague problem, the student generates three distinct problem framings, chooses one with rationale, and produces a range of candidate solutions within the chosen frame.
Cognitive operations.
- Question the framing of the problem (Analyze, meta-level)
- Generate divergent framings (Create)
- Generate divergent solutions within a frame (Create)
- Stress-test favored ideas (Evaluate)
- Converge on a direction with reasoning (Evaluate)
Base distribution and rationale. Creative Expander (M6) is dominant because divergent generation across options is the defining operation. Problem Setter (M8) is heavy because three-framing exercises are meta-level work. Critical Challenger (M7) supports the stress-test phase. Base: {.03, .03, .05, .12, .05, .35, .20, .17}. Path: 8 → 6 → 7 → 6 (reframe → generate → challenge → re-expand).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 3 | 3 | 5 | 12 | 5 | 35 | 20 | 17 | 8 → 6 → 7 → 6 |
| Beginner | 6 | 6 | 12 | 18 | 5 | 32 | 12 | 8 | 3 → 8 → 6 → 3 → 7 → 6 |
| Intermediate | 2 | 3 | 5 | 13 | 5 | 39 | 19 | 14 | 8 → 6 → 7 → 6 |
| Expert | 1 | 1 | 2 | 9 | 5 | 34 | 24 | 23 | 8 → 6 → 5 → 7 → 6 |
Notes. Beginner distribution is more balanced than other tasks because a novice brainstormer needs some Tutor-style scaffolding on what divergent thinking looks like before they can produce it independently. Expert version is Problem Setter + Creative + Challenger dominant — this is the shape of expert ideation.
4.10 task-research — Research & Fact-Checking (Book 1, Ch 10)
Task. The student researches a claim or question using AI for source discovery and synthesis, producing a verification log and an annotated research synthesis.
Cognitive operations.
- Frame research question (Analyze, meta-level)
- Generate candidate search terms and source types (Create)
- Verify every source exists and says what AI claims (Evaluate)
- Stress-test synthesis against counter-evidence (Evaluate)
Base distribution and rationale. Verification (M5) is overwhelmingly dominant because the canonical research workflow (canonical-skill §Mode 5, combined with hallucination risk from canonical-skill §Mode 1) makes verification the load-bearing component. Critical Challenger (M7) is substantial because counter-evidence matters. Creative Expander (M6) supports candidate-source generation. Base: {.05, .03, .05, .10, .35, .10, .25, .07}. Path: 8 → 6 → 5 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 3 | 5 | 10 | 35 | 10 | 25 | 7 | 8 → 6 → 5 → 7 → 5 |
| Beginner | 10 | 5 | 11 | 14 | 35 | 8 | 14 | 3 | 3 → 8 → 6 → 3 → 5 → 7 → 3 → 5 |
| Intermediate | 4 | 3 | 5 | 11 | 36 | 11 | 24 | 6 | 8 → 6 → 5 → 7 → 5 |
| Expert | 1 | 1 | 2 | 8 | 37 | 10 | 31 | 10 | 8 → 6 → 5 → 7 → 5 |
Notes. Under §3.4 the Expert path now retains both Verifications rather than collapsing them — continuous verification is represented by Mode 5 appearing multiple times in the path, not by truncation. Research task rubric emphasizes citation-existence verification explicitly in the artifact criteria.
4.11 task-decisions — Making Decisions (Book 1, Ch 11)
Task. Apply a 6-stage decision model (Frame → Options → Evaluate → Decide → Commit → Review) with AI, producing a decision matrix and a decision journal entry.
Cognitive operations.
- Frame decision (Analyze, meta-level)
- Generate options (Create)
- Evaluate options against criteria (Evaluate)
- Stress-test the favored option (Evaluate)
- Verify critical assumptions (Evaluate)
Base distribution and rationale. Decisions are balanced-mode tasks: Problem Setter (framing) + Creative (options) + Verification (assumption checking) + Critical Challenger (stress-test) all carry weight. Collaborative (M4) captures iterative reasoning on criteria. Base: {.05, .03, .08, .15, .20, .15, .20, .14}. Path: 8 → 6 → 4 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 3 | 8 | 15 | 20 | 15 | 20 | 14 | 8 → 6 → 4 → 7 → 5 |
| Beginner | 10 | 5 | 16 | 21 | 19 | 12 | 11 | 6 | 3 → 8 → 6 → 3 → 4 → 7 → 3 → 5 |
| Intermediate | 4 | 3 | 8 | 17 | 21 | 17 | 19 | 12 | 8 → 6 → 4 → 7 → 5 |
| Expert | 1 | 1 | 3 | 12 | 22 | 15 | 25 | 20 | 8 → 6 → 5 → 4 → 7 → 5 |
Notes. Grounded in Paul-Elder's framework for critical thinking (intellectual standards) and Kahneman's (2011) dual-system work on decision biases. The Beginner version includes more Tutor because novice decision-makers benefit from explicit frameworks before attempting free-form application.
4.12 task-fitness-check — Capstone Fitness Check (Book 1, Ch 12)
Task. End-of-course free-form assessment where students demonstrate mode fluency across a task of their choosing, producing an AI Practice Statement.
Cognitive operations.
- Select a task that exercises multiple modes (meta-level)
- Demonstrate mode-switching in a single conversation (Analyze, Evaluate, Create)
- Reflect on which modes felt natural and which required effort (Analyze, meta-level)
Base distribution and rationale. This is the capstone equivalent of the baseline. Unlike baseline (which measures untrained behavior), fitness check measures trained behavior — students should now show a balanced, Agency-rich profile. The base distribution is deliberately flat across Partnership and Agency modes to reward breadth over any single concentration. Low but non-zero Passivity because Oracle can still be the right mode for factual look-ups. Base: {.08, .10, .15, .15, .15, .12, .15, .10}. Path: (none — student chooses).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 10 | 15 | 15 | 15 | 12 | 15 | 10 | (none) |
| Beginner | 13 | 14 | 26 | 17 | 12 | 8 | 7 | 3 | (none) |
| Intermediate | 6 | 10 | 14 | 17 | 16 | 14 | 14 | 9 | (none) |
| Expert | 2 | 6 | 7 | 14 | 19 | 14 | 22 | 16 | (none) |
Notes. Because this is the capstone, the tier-adjusted targets are not the grading bar — students are graded on breadth (≥ 5 modes represented) and Agency-tier proportion (≥ 40% for pass). The tier-adjusted targets are shown as developmental references.
4.13 task-communication — Communication & Email (Book 2, Ch 13)
Task. Draft, tone-tune, and triage professional communications (emails, messages, meeting notes) using AI while maintaining authentic voice.
Cognitive operations.
- Understand communication goal (Understand)
- Generate draft (Create)
- Tune tone for audience (Apply, Evaluate)
- Verify content accuracy (Evaluate)
Base distribution and rationale. Production Assistant (M2) is dominant because AI drafting is the productive core. Tutor (M3) appears when students ask for communication-style explanations. Critical Challenger is low — you stress-test decisions, not emails. Base: {.08, .35, .08, .15, .15, .10, .05, .04}. Path: 8 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 35 | 8 | 15 | 15 | 10 | 5 | 4 | 8 → 2 → 5 |
| Beginner | 11 | 42 | 12 | 15 | 10 | 6 | 2 | 1 | 3 → 8 → 2 → 3 → 5 |
| Intermediate | 6 | 34 | 8 | 17 | 16 | 12 | 5 | 3 | 8 → 2 → 5 |
| Expert | 3 | 23 | 4 | 16 | 23 | 14 | 9 | 8 | 8 → 2 → 5 |
Notes. The Production Assistant weight reflects that communication tasks are legitimate AI-drafting domains provided the student has authentic input and voice-check discipline.
4.14 task-data — Data & Spreadsheets (Book 2, Ch 14)
Task. Clean, analyze, and communicate findings from a dataset using AI for code, formula, and visualization assistance.
Cognitive operations.
- Understand dataset and question (Understand)
- Plan analysis approach (Analyze)
- Generate code/formulas (Create — Production)
- Debug and verify (Evaluate)
- Interpret and communicate findings (Analyze, Create)
Base distribution and rationale. Balanced across Tutor (M3, for understanding statistical concepts), Production (M2, for code/formula generation), Collaborative (M4, for iterative analysis), and Verification (M5, for checking results). Base: {.08, .20, .20, .18, .20, .05, .05, .04}. Path: 8 → 3 → 4 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 20 | 20 | 18 | 20 | 5 | 5 | 4 | 8 → 3 → 4 → 2 → 5 |
| Beginner | 11 | 23 | 29 | 17 | 14 | 3 | 2 | 1 | 3 → 8 → 3 → 4 → 2 → 3 → 5 |
| Intermediate | 6 | 19 | 19 | 21 | 21 | 6 | 5 | 3 | 8 → 3 → 4 → 2 → 5 |
| Expert | 3 | 13 | 11 | 20 | 30 | 7 | 9 | 8 | 8 → 3 → 5 → 4 → 2 → 5 |
Notes. Unlike writing, data work legitimately involves heavy Production because code-generation is a well-defined task. Verification is critical because AI-generated data code routinely has subtle bugs (wrong join logic, off-by-one indexing).
4.15 task-presentations — Presentations & Speaking (Book 2, Ch 15)
Task. Design a presentation with outline, speaker notes, slides, and Q&A prep. Use AI for structure, visual concepts, and rehearsal.
Cognitive operations.
- Frame presentation goal and audience (Analyze, meta-level)
- Generate slide content and visual ideas (Create)
- Draft speaker notes (Create — Production)
- Stress-test likely questions (Evaluate)
Base distribution and rationale. Production (M2) and Creative (M6) are both substantial — presentations benefit from AI-generated drafts AND from diverse visual/framing options. Critical Challenger appears for Q&A prep. Base: {.05, .25, .10, .15, .15, .15, .10, .05}. Path: 8 → 6 → 4 → 2 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 25 | 10 | 15 | 15 | 15 | 10 | 5 | 8 → 6 → 4 → 2 → 7 → 5 |
| Beginner | 8 | 33 | 16 | 16 | 11 | 10 | 4 | 2 | 3 → 8 → 6 → 3 → 4 → 2 → 3 → 7 → 5 |
| Intermediate | 4 | 24 | 9 | 17 | 16 | 17 | 9 | 4 | 8 → 6 → 4 → 2 → 7 → 5 |
| Expert | 2 | 15 | 5 | 15 | 21 | 19 | 16 | 9 | 8 → 6 → 5 → 4 → 2 → 5 → 7 → 5 |
4.16 task-career — Career Acceleration (Book 2, Ch 16)
Task. Build or revise resume, cover letter, and interview prep document with AI.
Cognitive operations.
- Frame target role and differentiators (Analyze, meta-level)
- Iterate on self-narrative with AI (Analyze, Evaluate)
- Generate drafts (Create — Production)
- Stress-test answers to likely interview questions (Evaluate)
Base distribution and rationale. Production (M2) is substantial because resume-drafting is a legitimate Production use. Collaborative (M4) carries the self-narrative work. Critical Challenger for interview prep. Verification ensures job-search facts (company info, salary data) are correct. Base: {.10, .25, .10, .15, .15, .10, .08, .07}. Path: 8 → 4 → 2 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 10 | 25 | 10 | 15 | 15 | 10 | 8 | 7 | 8 → 4 → 2 → 7 → 5 |
| Beginner | 15 | 31 | 16 | 16 | 11 | 6 | 3 | 2 | 3 → 8 → 4 → 3 → 2 → 7 → 3 → 5 |
| Intermediate | 8 | 24 | 10 | 17 | 16 | 12 | 8 | 6 | 8 → 4 → 2 → 7 → 5 |
| Expert | 3 | 15 | 5 | 15 | 22 | 13 | 13 | 13 | 8 → 4 → 5 → 2 → 7 → 5 |
4.17 task-finance — Personal Finance & Life Optimization (Book 2, Ch 17)
Task. Build a 90-day financial plan using AI for research, calculation, and decision-journal entries.
Cognitive operations.
- Frame financial goal (Analyze, meta-level)
- Understand relevant concepts (Understand — Tutor)
- Generate options (Create)
- Calculate scenarios (Apply)
- Verify claims and numbers (Evaluate)
Base distribution and rationale. Tutor dominates because personal-finance tasks usually require learning (what a Roth IRA is, how compound interest works) before acting. Verification is essential because AI financial advice has hallucination risk on interest rates, tax rules, etc. Base: {.12, .08, .20, .15, .20, .10, .10, .05}. Path: 8 → 3 → 4 → 5 → 7.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 12 | 8 | 20 | 15 | 20 | 10 | 10 | 5 | 8 → 3 → 4 → 5 → 7 |
| Beginner | 17 | 10 | 31 | 16 | 14 | 6 | 4 | 2 | 3 → 8 → 3 → 4 → 5 → 3 → 7 |
| Intermediate | 9 | 8 | 19 | 17 | 21 | 12 | 10 | 4 | 8 → 3 → 4 → 5 → 7 |
| Expert | 4 | 5 | 10 | 15 | 28 | 13 | 16 | 9 | 8 → 3 → 5 → 4 → 5 → 7 |
Notes. Domain-specific warning: financial information from AI must be verified against authoritative sources (IRS, SEC, institution documentation). The verification weight is conservative.
4.18 task-collaboration — Team Collaboration (Book 2, Ch 18)
Task. Build a team charter and meeting workflow document using AI for structure and meeting-facilitation support.
Cognitive operations.
- Frame team context (Analyze)
- Generate charter options (Create)
- Iterate collaboratively on meeting structure (Analyze, Create)
- Draft templates (Create — Production)
Base distribution and rationale. Collaborative (M4) is dominant because team-related tasks inherently involve bidirectional iteration. Creative Expander supports option generation. Production for templates. Base: {.05, .15, .10, .25, .15, .15, .10, .05}. Path: 8 → 4 → 6 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 15 | 10 | 25 | 15 | 15 | 10 | 5 | 8 → 4 → 6 → 2 → 5 |
| Beginner | 8 | 20 | 17 | 28 | 12 | 10 | 4 | 2 | 3 → 8 → 4 → 3 → 6 → 2 → 3 → 5 |
| Intermediate | 4 | 14 | 9 | 28 | 15 | 17 | 9 | 4 | 8 → 4 → 6 → 2 → 5 |
| Expert | 1 | 8 | 5 | 24 | 20 | 18 | 15 | 8 | 8 → 4 → 5 → 6 → 2 → 5 |
4.19 task-social-media — Social Media & Content (Book 2, Ch 19)
Task. Build brand voice guide and content calendar using AI for ideation and draft generation.
Cognitive operations.
- Frame audience and brand voice (Analyze)
- Generate post ideas (Create)
- Draft content (Create — Production)
- Verify factual claims in content (Evaluate)
Base distribution and rationale. Production (M2) and Creative (M6) both substantial. Creative reflects ideation load; Production reflects draft-generation load. Verification is essential because public-facing content is high-stakes if wrong. Base: {.05, .30, .08, .12, .15, .20, .05, .05}. Path: 8 → 6 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 30 | 8 | 12 | 15 | 20 | 5 | 5 | 8 → 6 → 2 → 5 |
| Beginner | 8 | 39 | 13 | 13 | 11 | 13 | 2 | 2 | 3 → 8 → 6 → 3 → 2 → 5 |
| Intermediate | 4 | 28 | 8 | 14 | 16 | 23 | 5 | 4 | 8 → 6 → 2 → 5 |
| Expert | 2 | 18 | 4 | 12 | 21 | 26 | 8 | 9 | 8 → 6 → 5 → 2 → 5 |
4.20 task-coding — Coding & Technical (Book 2, Ch 20)
Task. Build or modify code artifacts using AI for design, writing, debugging, and reasoning-log documentation.
Cognitive operations.
- Design approach (Analyze, Create)
- Understand unfamiliar libraries/patterns (Understand — Tutor)
- Generate code (Create — Production)
- Debug with AI assistance (Analyze, Evaluate)
- Verify behavior (Evaluate)
Base distribution and rationale. Production (M2) is substantial because code generation is a legitimate use. Tutor is heavy because coding tasks often require learning APIs. Verification is essential because AI-generated code has known hallucination patterns (non-existent functions, wrong API signatures). Base: {.08, .25, .18, .15, .20, .05, .05, .04}. Path: 8 → 3 → 4 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 25 | 18 | 15 | 20 | 5 | 5 | 4 | 8 → 3 → 4 → 2 → 5 |
| Beginner | 11 | 29 | 26 | 14 | 14 | 3 | 2 | 1 | 3 → 8 → 3 → 4 → 2 → 3 → 5 |
| Intermediate | 6 | 24 | 17 | 17 | 21 | 6 | 5 | 4 | 8 → 3 → 4 → 2 → 5 |
| Expert | 3 | 16 | 10 | 16 | 31 | 7 | 9 | 8 | 8 → 3 → 5 → 4 → 2 → 5 |
Notes. Verification weight is deliberately conservative; mature coding practice with AI requires continuous testing, not periodic checks. The canonical-skill classifier correctly routes code-debugging messages to Collaborative or Verification depending on artifact presence.
4.21 task-entrepreneurship — Entrepreneurship & Side Projects (Book 2, Ch 21)
Task. Build business model canvas, validation plan, and iterate on venture concept using AI.
Cognitive operations.
- Frame opportunity (Analyze, meta-level)
- Generate business-model options (Create)
- Iterate via customer-feedback reasoning (Analyze)
- Stress-test assumptions (Evaluate)
- Verify market/competitor claims (Evaluate)
Base distribution and rationale. Balanced across Creative (options), Collaborative (iteration), Critical Challenger (assumption testing), and Problem Setter (opportunity framing). Production is modest because the canvas itself is small-format. Base: {.05, .10, .12, .15, .13, .15, .15, .15}. Path: 8 → 6 → 4 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 10 | 12 | 15 | 13 | 15 | 15 | 15 | 8 → 6 → 4 → 7 → 5 |
| Beginner | 9 | 15 | 22 | 19 | 11 | 11 | 7 | 6 | 3 → 8 → 6 → 3 → 4 → 7 → 3 → 5 |
| Intermediate | 4 | 10 | 12 | 17 | 14 | 17 | 14 | 13 | 8 → 6 → 4 → 7 → 5 |
| Expert | 1 | 5 | 5 | 13 | 16 | 17 | 20 | 23 | 8 → 6 → 5 → 4 → 7 → 5 |
4.22 task-health — Health Research & Decisions (Book 2, Ch 22)
Task. Research a health-related decision using AI for literature synthesis and personal-context reasoning. Produce decision document + verification log.
Cognitive operations.
- Frame health question (Analyze)
- Understand relevant concepts (Understand — Tutor)
- Verify claims against authoritative sources (Evaluate)
- Stress-test advice (Evaluate)
Base distribution and rationale. Verification (M5) is dominant and disproportionately weighted because health stakes are high and AI hallucinations carry real-world harm. Tutor supports understanding. Critical Challenger supports stress-testing. Base: {.08, .05, .20, .15, .30, .08, .08, .06}. Path: 8 → 3 → 5 → 7.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 8 | 5 | 20 | 15 | 30 | 8 | 8 | 6 | 8 → 3 → 5 → 7 |
| Beginner | 12 | 6 | 32 | 16 | 23 | 5 | 3 | 2 | 3 → 8 → 3 → 5 → 7 |
| Intermediate | 6 | 5 | 19 | 17 | 31 | 9 | 8 | 5 | 8 → 3 → 5 → 7 |
| Expert | 2 | 3 | 9 | 14 | 39 | 10 | 12 | 10 | 8 → 3 → 5 → 7 |
Notes. This task carries an explicit academic/medical caveat: AI is not a substitute for licensed medical advice, and the rubric reinforces verification against authoritative health sources (peer-reviewed literature, CDC/NIH, etc.). Artifact criteria require evidence of source-verification.
4.23 task-ethics — Ethics & AI Thinking (Book 2, Ch 23)
Task. Analyze an ethics case using AI as a thinking partner, applying an ethics framework and stress-testing conclusions.
Cognitive operations.
- Frame the ethical question (Analyze, meta-level)
- Apply an ethical framework (Apply, Analyze)
- Collaborate to weigh competing considerations (Analyze, Evaluate)
- Stress-test position (Evaluate)
Base distribution and rationale. Collaborative (M4) and Critical Challenger (M7) are co-dominant because ethical reasoning benefits from both iteration and adversarial check. Problem Setter is heavy because framing an ethics case correctly is half the work. Base: {.03, .05, .15, .20, .15, .10, .20, .12}. Path: 8 → 4 → 7 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 3 | 5 | 15 | 20 | 15 | 10 | 20 | 12 | 8 → 4 → 7 → 5 |
| Beginner | 5 | 7 | 28 | 25 | 13 | 7 | 10 | 4 | 3 → 8 → 4 → 3 → 7 → 5 |
| Intermediate | 2 | 5 | 14 | 23 | 16 | 11 | 19 | 10 | 8 → 4 → 7 → 5 |
| Expert | 1 | 2 | 6 | 17 | 18 | 11 | 27 | 18 | 8 → 4 → 5 → 7 → 5 |
4.24 task-ai-agents — AI Agents & Automation (Book 2, Ch 24)
Task. Design and specify an AI agent or automation workflow, with documentation and monitoring plan.
Cognitive operations.
- Frame automation goal (Analyze, meta-level)
- Understand agent capabilities (Understand — Tutor)
- Iterate design collaboratively (Analyze, Create)
- Draft specification (Create — Production)
- Verify agent behavior (Evaluate)
Base distribution and rationale. Production (for specifications), Tutor (for learning agent concepts), Collaborative (for design iteration), Verification (for validating agent behavior). Problem Setter to frame the right automation opportunity. Base: {.05, .20, .15, .20, .15, .10, .08, .07}. Path: 8 → 3 → 4 → 2 → 5.
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 5 | 20 | 15 | 20 | 15 | 10 | 8 | 7 | 8 → 3 → 4 → 2 → 5 |
| Beginner | 7 | 25 | 24 | 21 | 11 | 6 | 3 | 2 | 3 → 8 → 3 → 4 → 2 → 3 → 5 |
| Intermediate | 4 | 19 | 14 | 23 | 16 | 11 | 8 | 6 | 8 → 3 → 4 → 2 → 5 |
| Expert | 2 | 12 | 8 | 20 | 21 | 13 | 13 | 12 | 8 → 3 → 5 → 4 → 2 → 5 |
4.25 task-casual — Casual & Personal (catch-all)
Task. Open-ended personal use (recipes, travel planning, hobby help, daily-life questions).
Cognitive operations. Whatever the user brings — no task-prescribed operations.
Base distribution and rationale. Like baseline, this task's "target" is a reference pattern, not a goal. Casual use is Oracle-dominant because daily questions are mostly factual ("what should I cook with these ingredients"). Low Agency-tier because stakes are low and cognitive load is low. Base: {.40, .25, .10, .10, .05, .05, .03, .02}. Path: (none).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 40 | 25 | 10 | 10 | 5 | 5 | 3 | 2 | (none) |
| Beginner | 47 | 25 | 13 | 8 | 3 | 2 | 1 | 0 | (none) |
| Intermediate | 34 | 26 | 10 | 13 | 6 | 6 | 3 | 2 | (none) |
| Expert | 19 | 24 | 8 | 16 | 11 | 10 | 8 | 6 | (none) |
Notes. Casual and work tasks are not graded against targets — the distribution shown is descriptive. Students may see how their casual-use pattern compares to others', but the comparison is informational.
4.26 task-work — Work & Professional (catch-all)
Task. Open-ended professional use outside the 24 structured task types (meeting prep, internal comms, project scoping, etc.).
Cognitive operations. Variable across professional domains.
Base distribution and rationale. Professional use is Production-dominant (drafting, summarizing) but with more Collaborative and Verification than casual because work has higher accuracy stakes. Base: {.20, .30, .10, .15, .10, .05, .05, .05}. Path: (none).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 20 | 30 | 10 | 15 | 10 | 5 | 5 | 5 | (none) |
| Beginner | 26 | 33 | 14 | 14 | 6 | 3 | 2 | 1 | (none) |
| Intermediate | 16 | 30 | 10 | 18 | 11 | 6 | 5 | 4 | (none) |
| Expert | 7 | 22 | 6 | 18 | 17 | 8 | 10 | 11 | (none) |
Notes. Like casual, work is descriptive rather than prescriptive. Students may still submit work tasks for pattern analysis and track their distribution over time.
4.27 task-path-design — Path Design Practice (Module 3 capstone)
Task. After learning the eight modes in Module 3, students pick a real task they're facing — academic, professional, or personal — and design a mode path for it before opening the conversation. They specify which modes they plan to use, in what order, and why. They run the conversation following their designed path and submit transcript + a short reflection on whether the path survived contact with the actual task.
Cognitive operations. Frame the task → Select modes → Sequence modes into a coherent path → Execute the path → Reflect on path-vs-reality divergence. The dominant Bloom level is Evaluate on the front and back ends and Create in the middle.
Why this task is different. Every other task in the spec has a prescribed mode_path for at least one tier — the rubric scores execution against the prescribed sequence. Path-design is the only task where mode_path is intentionally empty across all three tiers because the student is the path designer. The rubric grades whether their self-designed path was coherent and whether their conversation followed it.
Base distribution and rationale. Module 3 introduces the eight modes; the path-design exercise is meant to demonstrate deliberate, varied mode use. Base distribution: {.10, .15, .10, .10, .15, .15, .15, .10} — flat-ish but with mid-Partnership/Agency tilt (M5/6/7 emphasized over M1/3/4/8) because path design rewards reflective, synthetic, and collaborative engagement. Path: (student-designed).
| Mode | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Path |
|---|---|---|---|---|---|---|---|---|---|
| Base | 10 | 15 | 10 | 10 | 15 | 15 | 15 | 10 | (student-designed) |
| Beginner | 16 | 21 | 18 | 12 | 12 | 11 | 7 | 4 | (student-designed) |
| Intermediate | 8 | 15 | 10 | 12 | 16 | 17 | 15 | 9 | (student-designed) |
| Expert | 3 | 8 | 5 | 9 | 19 | 18 | 22 | 16 | (student-designed) |
Artifact. Required. The artifact is the design document plus the reflection — not the transcript. Sonnet-graded. Three criteria:
- Design coherence (40%) — does the planned path fit the chosen task? Are mode choices justified? Is the sequence intentional?
- Execution fidelity (30%) — did the student follow their designed path? Deviations are not penalized but must be noticed and accounted for in reflection.
- Reflection quality (30%) — how thoughtful is the post-hoc analysis of where the path held vs. broke?
Notes. Path-design is the only task whose rubric grading weight is dominated by the artifact rather than the engagement distribution; the engagement distribution is descriptive (what the student naturally produced) rather than the primary scoring axis. Beginners tend toward more M1/2/3 (information-seeking, verifying, tutoring) — they apply the modes most legible to a novice. Experts tend toward more M5/7/8 (challenging, collaborating, problem-setting) — modes that require established mode fluency. Both can score well if their design matches their task; tier patterns describe natural production, not quality.
This task corresponds to the new Module 3 ("The AI Engagement Toolkit") inserted into the Learn-with-AI architecture per decisions.md 2026-04-25 §A. When the broader chapter renumber sweep happens (Module 3 chapter slot is now occupied by the toolkit; old Module 3 → new Module 4, etc.), this section may move to §4.3 with the rest of the §4.x sequence shifting; the rubric content here is stable regardless of section-number assignment.
4.28 Cross-task observations
Reading across all 26 task-tier rubrics, a few systematic patterns emerge:
- Beginners concentrate on 3–4 modes per task. Across tasks, Beginner distributions rarely have more than 4 modes above 10%. This is consistent with Dreyfus & Dreyfus (1980) — novices operate on explicit scaffolds, not full repertoires.
- Experts are mode-diverse. Expert distributions show more modes above 10%, often 6–7, because experts adaptively deploy whichever mode fits the moment. This is the empirical signature of the mode-fluency construct.
- Verification (M5) grows with expertise on every single task. The multiplier (1.4 for Expert) combines with the fact that Verification's base proportion is already substantial on most tasks. This is the single most robust pattern in the rubric system.
- Production Assistant (M2) is expertise-agnostic for tasks that legitimately require it (data, coding, communication, career). For tasks where AI drafting undermines the learning objective (writing, research), Production declines sharply with expertise.
- Problem Setter (M8) grows fastest with expertise. The multiplier is 1.8 — the steepest in the framework. This reflects the theoretical claim that meta-level problem reframing is the defining expert cognitive move.
These patterns are predictions, not facts. The ML pipeline (§6) is designed to test whether observed distributions from high-performing students actually match these tier-adjusted targets. Persistent mismatches are the trigger for methodology revision.
5. Validation protocol
The rubrics produced by the methodology above are theoretical. Validation proceeds at three levels.
5.1 Construct validity of the modes themselves
The eight modes are a taxonomy of user-AI interaction. Their construct validity rests on factor structure: do survey items claiming to measure different modes cluster into the expected factors? Do observed behavioral classifications form the hypothesized tier structure?
Wave 1 survey data (N=331, collected 2026-01 through 2026-03 from student samples only) shows:
- Eight-factor structure for the AIT self-report items holds at exploratory factor analysis (details in
3-Research/AI engagement/). - Tier structure validates: Oracle correlates negatively with Mode 5–8 items (r = −0.16 to −0.35); Modes 5–8 cluster (inter-correlations r = 0.58–0.70).
This establishes that the modes and tiers are empirically distinguishable constructs. It does not yet establish that any particular target distribution is optimal — that requires conversation-level data.
5.2 Classifier reliability
The classifier is an LLM (Haiku 4.5) applying the canonical mode definitions to messages. Its reliability is measured by Cohen's kappa against human coders who have calibrated to the canonical guide.
Target: κ ≥ 0.80 on a held-out calibration set of 200 messages spanning all 8 modes. As of 2026-04-22, this calibration set does not yet exist; it is the critical deliverable blocking the MyEducator integration (see Grading Spec CHANGELOG §0.2.0).
Procedure: Two independent coders code the set after reading the canonical guide. Disagreements are adjudicated by a third coder. The classifier is then run on the same set; κ against the adjudicated labels is the reliability estimate. Classifications where the classifier's confidence is in [0.50, 0.70] are re-run through a Sonnet-class model; both labels are stored.
5.3 Rubric validity (the hard part)
A rubric is valid if students who perform well on the task — as judged by an independent grader — also score well on the rubric, and vice versa. This is a predictive-validity claim that can only be tested with real student data.
Validation will be staged:
-
Stage A (Months 1–3): Collect conversations, classifications, rubric scores, and independent artifact grades. Compute the correlation between track-adjusted rubric scores and artifact grades. Expected: r > 0.40 at the task level, indicating the rubric captures task-relevant variance. Lower correlations indicate the rubric is miscalibrated for that task.
-
Stage B (Months 3–6): For each task where Stage A correlation is below 0.40, examine which rubric components underperform. Revise the methodology (Section 2 Step 3 weights, or Section 3 multipliers) for those tasks. Re-derive the rubric, bump version, and re-run Stage A.
-
Stage C (Months 6–12): For each task where the revised rubric still underperforms, treat the theoretical target as provisional and move the task to the empirical calibration track described in Section 6. This is the fallback for tasks where theory is insufficient and data must dominate.
-
Stage D (Ongoing): Wave 2 of the AAEV instrument (planned 2026 Q3) collects self-report plus behavioral alignment from the same students. This enables a cross-method validity check: do students who report high Agency-tier self-efficacy actually exhibit it behaviorally? Disagreements are evidence of measurement problems in one or both channels.
5.4 Research paper tie-in
The research program reports these validations as part of the Methods and Results sections. Each paper carries the rubric version it used; a rubric revision triggers a re-analysis for any paper whose data predates the revision.
6. Continuous revision — the ML pipeline
Theory produces the first rubric. Data produces every rubric after that. This section specifies the ML pipeline that converts aimodes's operational data stream into periodic, rigorous rubric revisions.
6.1 Data sources
Every submission writes the following to the engagement_reports and submissions tables in Supabase:
- Raw transcript
- Per-message mode classifications with confidence scores
- Mode-distribution vector (8 values, summing to 1.0)
- Tier-distribution vector (3 values)
- 6-component objective score breakdown
- Track-adjusted score
- Task type, declared expertise level, user archetype
- Artifact grade (where applicable; currently null for most assignments)
- Instructor grade (from MyEducator gradebook where it integrates; currently unavailable)
- Timestamps, user ID, assignment ID
The pipeline augments this with derived quantities: transition matrices per conversation, convergence of the user's archetype over time, and mode-path adherence per assignment.
6.2 Pipeline stages
-
Extraction job (daily). Pulls newly-inserted
engagement_reportsrows from the last 24 hours into a read-only analytics schema. Excludes rows flaggedartifact_needs_review = trueor where the submission is stillstatus = pending. -
Per-task aggregation (weekly). For each task type × self-reported expertise level, computes:
- Empirical mean mode distribution across all submissions
- Variance of each mode's proportion
- Empirical modal mode path (most common first-appearance order across the top-quartile of submissions by artifact grade, where artifact grades exist)
- Sample size
n
-
Divergence detection (weekly). For each task × tier cell with
n ≥ 50, computes the L1 distance between the current theoretical target and the empirical distribution of top-quartile students (by artifact grade where available, by track score as fallback). Cells where L1 > 0.30 are flagged for review. -
Candidate revision (monthly). For flagged cells, a candidate revised target is proposed as
0.6 × theoretical + 0.4 × empirical_top_quartile. The weight mix (60/40) is conservative in v0.1; as sample sizes grow and the ML pipeline gains confidence, the weighting can shift toward empirical. Candidates are never auto-applied — they are written to a review queue. -
Human review (monthly). Keith, Wood, Posey review candidate revisions. Each candidate must pass three checks:
- Theoretical plausibility. Does the revision make sense under the Section 2 methodology? If not, the methodology is flagged for revision rather than the rubric.
- Sample size adequacy. Is
n ≥ 50per cell robust enough for the variance observed? - Stability across cohorts. Do cohorts from different institutions produce consistent divergences, or is the divergence specific to one cohort (possibly reflecting cohort composition rather than true rubric miscalibration)?
-
Version bump and rollout. Approved revisions bump the rubric's minor version. New submissions score against the new rubric; historical submissions retain their original version tag (per the canonical skill file's versioning rule — no retroactive rescoring). The new version propagates to
aimodes-grading-spec/and, via the diff tool, to MyEducator. -
Longitudinal re-validation (per semester). Every semester, the full set of rubrics is re-run against a freshly-collected test set of ≥ 500 conversations. Cohen's kappa against human-adjudicated labels is recomputed. If kappa drops below 0.80 on any task, the classifier and/or canonical mode definitions are investigated — this signals either drift in the classifier model (new Haiku version) or drift in student behavior beyond the rubric's parameters.
6.3 ML models in the pipeline
The pipeline uses four ML components:
-
Top-quartile identification (supervised). When artifact grades are available, a gradient-boosted regressor predicts artifact grade from the 6-component rubric score. Top quartile is defined by predicted grade (robust to grader inconsistency) rather than raw artifact grade. When artifact grades are absent, top quartile falls back to raw track score.
-
Expertise inference (supervised). The
predictor.tsmodule already predicts the user's expertise level from their mode distribution. This prediction is compared to self-report to identify mis-calibrated users whose self-reported tier diverges from their behavioral pattern. Such users are excluded from per-tier aggregation until the prediction stabilizes. -
Archetype evolution (unsupervised). A Markov model fits transitions between the 6 archetypes (Delegator → Partner → Verifier → Creator → Challenger → Architect) across a user's sequence of submissions. Stable archetype estimates inform the per-user tier prior. Outside the rubric-revision pipeline, this model also feeds user-facing longitudinal trajectory visualization.
-
Drift detection (unsupervised). Changes in task-level distributions over time are monitored with a CUSUM control chart. Sudden shifts signal either instructor-driven changes in how the task is framed in the book, classifier drift (new model version), or cohort composition change. CUSUM alerts go to the monthly review queue.
6.4 Research integrity
The ML pipeline is not an auto-tuning system that optimizes rubrics for student grade correlation. It is a candidate-proposal system that surfaces divergences; every revision is human-approved with explicit theoretical justification. This matters because auto-tuned rubrics drift toward post-hoc fitting — rewarding whatever students actually do rather than what theory says they should do. The pipeline's role is to surface evidence that theory is miscalibrated, not to replace theory with curve-fitting.
This distinction is critical for the research program. Papers that cite aimodes rubrics must be able to say which version was in effect during data collection and what theoretical justification underwrote that version. Auto-tuned rubrics break that traceability.
7. References
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4(1), 55–81. https://doi.org/10.1016/0010-0285(73)90004-2
Dreyfus, S. E., & Dreyfus, H. L. (1980). A five-stage model of the mental activities involved in directed skill acquisition (Tech. Rep. ORC-80-2). University of California, Berkeley Operations Research Center.
Edwards, J. R. (1991). Person-job fit: A conceptual integration, literature review, and methodological critique. In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organizational psychology (Vol. 6, pp. 283–357). Wiley.
Ericsson, K. A. (2006). The influence of experience and deliberate practice on the development of superior expert performance. In K. A. Ericsson, N. Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), The Cambridge handbook of expertise and expert performance (pp. 683–703). Cambridge University Press.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406. https://doi.org/10.1037/0033-295X.100.3.363
Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. https://doi.org/10.1177/1745691612460685
Hammond, K. R. (1988). Judgment and decision making in dynamic tasks. Information and Decision Technologies, 14(1), 3–14.
Hammond, K. R. (1996). Human judgment and social policy: Irreducible uncertainty, inevitable error, unavoidable injustice. Oxford University Press.
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review, 19(4), 509–539. https://doi.org/10.1007/s10648-007-9054-3
Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23–31. https://doi.org/10.1207/S15326985EP3801_4
Keith, M., Wood, D., & Posey, C. (working paper). Two cognitive channels in human–AI engagement: calibration governs behavior, metacognition governs perception.
Parasuraman, R., & Manzey, D. H. (2010). Complacency and bias in human use of automation: An attentional integration. Human Factors, 52(3), 381–410. https://doi.org/10.1177/0018720810376055
Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9), 676–688. https://doi.org/10.1016/j.tics.2016.07.002
Schoenfeld, A. H. (1985). Mathematical problem solving. Academic Press.
Sperber, D., Clément, F., Heintz, C., Mascaro, O., Mercier, H., Origgi, G., & Wilson, D. (2010). Epistemic vigilance. Mind & Language, 25(4), 359–393. https://doi.org/10.1111/j.1468-0017.2010.01394.x
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4
Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312. https://doi.org/10.1016/0959-4752(94)90003-5
Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251–296. https://doi.org/10.1023/A:1022193728205
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press.
Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into Practice, 41(2), 64–70. https://doi.org/10.1207/s15430421tip4102_2
Version history
0.1 (2026-04-22) — initial draft
- Sections 1–3, 5, 6, 7 drafted as methodology anchor for the grading spec package.
- Section 4 (per-task derivations for 26 task types × 3 tiers = 78 rubrics) deferred pending author review of the methodology.
- ML pipeline described at stage level; implementation-level specs (job schedules, data schemas, alert thresholds) deferred to a separate engineering document once the pipeline is prioritized for build.
- Publishes at
/research/rubric-theoryon aimodes.ai; source-of-truth for the grading spec package at/6-Apps/aimodes-grading-spec/.