AI, Metacognition, and the Verification Bottleneck: A Three-Wave Longitudinal Study of Human Problem-Solving
arXiv:2601.17055v1 Announce Type: new
Abstract: This longitudinal pilot study tracked how generative AI reshapes problem-solving over six months across three waves in an academic setting. AI integration reached saturation by Wave 3, with daily use rising from 52.4% to 95.7% and ChatGPT adoption from 85.7% to 100%. A dominant hybrid workflow increased 2.7-fold, adopted by 39.1% of participants. The verification paradox emerged: participants relied most heavily on AI for difficult tasks (73.9%) yet showed declining verification confidence (68.1%) where performance was worst (47.8% accuracy on complex tasks). Objective performance declined systematically: 95.2% to 81.0% to 66.7% to 47.8% across problem difficulty, with belief-performance gaps widening to 34.6 percentage points. This indicates a fundamental shift where verification, not solution generation, became the bottleneck in human-AI problem-solving. The ACTIVE Framework synthesizes findings grounded in cognitive load theory: Awareness and task-AI alignment, Critical verification protocols, Transparent human-in-the-loop integration, Iterative skill development countering cognitive offloading, Verification confidence calibration, and Ethical evaluation. The authors provide implementation pathways for institutions and practitioners. Key limitations include sample homogeneity (academic cohort only, convenience sampling) limiting generalizability to corporate, clinical, or regulated professional contexts; self-report bias in confidence measures (32.2 percentage point divergence from objective performance); lack of control conditions; restriction to mathematical/analytical problems; and insufficient timeframe to assess long-term skill trajectories. Results generalize primarily to early-adopter, academically affiliated populations. Causal validation requires randomized controlled trials.