Safety, Hallucination, and Failure Modes in Agentic Health AI: A State-of-the-Art Review

The deployment of agentic artificial intelligence systems in clinical environments is accelerating rapidly, with autonomous agents increasingly applied across radiology, clinical decision support, intensive care monitoring, drug discovery, and patient facing care. Unlike conventional single turn AI tools, agentic systems autonomously plan multistep tasks, invoke external tools, retain memory across interactions, and pursue clinical goals with minimal human intervention, introducing a qualitatively distinct and poorly characterised safety profile that existing literature has not comprehensively addressed. This paper addresses that gap through a Systematic Literature Review conducted in accordance with PRISMA 2020 guidelines, synthesising evidence from 113 peer reviewed publications published between January 2019 and December 2025 across PubMed, IEEE Xplore, Scopus, ACM Digital Library, arXiv, and Web of Science. The review makes four original contributions: it develops the first structured failure mode taxonomy specific to agentic health AI, classifying seven distinct categories spanning reasoning failures, hallucination failures, tool misuse failures, memory failures, automation bias failures, adversarial and distributional failures, and equity and bias failures; it maps a clinical hallucination typology across factual, contextual, citation, and numerical types with associated risk profiles; it systematically evaluates existing safety frameworks and mitigation strategies including Retrieval Augmented Generation, Human in the Loop design, Constitutional AI, and red teaming against the identified failure mode taxonomy; and it proposes an integrated safety evaluation framework combining Failure Mode and Effects Analysis, the Swiss Cheese Model, and Human Factors theory as a practical governance tool for clinical deployment. The findings confirm that agentic health AI presents compounding safety risks driven by autonomy, multistep reasoning, tool access, and confidence presentation, that current mitigation strategies remain predominantly reactive and incomplete, and that critical gaps persist in standardised benchmarking, longitudinal deployment evidence, and equity focused evaluation, underscoring the urgent need for aligned engineering, clinical governance, and regulatory frameworks.

Liked Liked