From Human Oversight to Human-in-the-Loop: Evolving Governance of Human–AI Interaction in Healthcare and AI Development
Background: Artificial intelligence (AI) has moved from research labs into the everyday work of clinicians and software engineers. In healthcare, AI systems now shape triage, imaging interpretation, risk prediction and documentation workflows, while in computer science and AI development, models are increasingly used to generate code, tests and even other AI systems [1–4,7,10,11,76–82]. Early governance frameworks framed “human oversight” as a high-level ethical injunction but provided limited operational guidance on how humans should interact with data-intensive, adaptive AI systems in these settings [5,36–38,76–78]. Objective: To examine how human oversight and human-in-the-loop (HITL) paradigms have developed specifically in (i) healthcare and (ii) computer science/AI development, and to identify converging design, organizational and governance patterns that support meaningful human control in big-data AI environments. Methods: Narrative synthesis of international and national governance instruments (EU AI Act, WHO, OECD, ICMR, FDA/GMLP), sector-specific guidance for healthcare AI (FUTURE-AI, CHAI, Joint Commission) and emerging frameworks for AI lifecycle governance and MLOps in software engineering [7,10,11,14,15,18,19,79–84,86–89]. We extracted definitions and framings of human oversight, technical and organizational requirements for HITL interaction, and implementation challenges related to cognitive load, big-data pipelines and adaptive models in healthcare and AI development. Results: Across healthcare and AI development, contemporary frameworks converge on multi-layered human-in-the-loop governance embedded throughout design, deployment, monitoring and decommissioning. Mandatory and consensus instruments emphasize override capability, transparency, user training, escalation pathways and post-deployment monitoring [7,10,11,14,15,18,19,79,80,86–89]. In both domains, oversight is shifting from ad hoc individual review to structured arrangements involving multidisciplinary committees, model-governance boards, MLOps processes and incident-learning systems. Persistent gaps include limited formal treatment of cognitive load and alert fatigue, difficulties overseeing continuously learning and foundation-model-based systems, and immature metrics for the effectiveness of human oversight itself [61–63,88,114,123,133–137]. Conclusions: Healthcare and computer science/AI development are emerging as mutually informative testbeds for human-in-the-loop AI governance in big-data settings. Meaningful oversight requires more than nominal human review: it depends on human-centered interface design, realistic workload management, lifecycle-oriented technical controls and organizational cultures that make it safe to question AI outputs. Lessons from clinical safety science and MLOps can be combined to architect human–AI interaction that amplifies, rather than erodes, professional judgment in both domains.