Agentic and LLM-Based Multimodal Anomaly Detection: Architectures, Challenges, and Prospects
Anomaly detection is crucial for maintaining the safety, reliability, and optimal performance of complex systems across diverse domains such as industrial manufacturing, cybersecurity, and autonomous systems. Conventional methods typically handle single data modalities, limiting their effectiveness in the multimodal and dynamic real-world environments. The integration of multimodal data sources, including visual, audio, and sensor data, has emerged as a key advancement, improving detection robustness and accuracy. Simultaneously, the rise of agentic artificial intelligence (AI), characterized by autonomous, goal-oriented agents capable of reasoning and utilizing tools, presents significant opportunities for enhancing anomaly detection systems. This paper provides a comprehensive review of recent advancements at the intersection of agentic AI and multimodal anomaly detection. We propose a novel taxonomy categorizing existing methods by agent architecture, reasoning capabilities, tool integration, and modality scope. We survey foundation model-based detectors, cross-modal fusion techniques, and LLM-driven agents that facilitate dynamic and interpretable anomaly reasoning. Furthermore, we present recent benchmark datasets, critical challenges, mitigations, and future research directions.