Evaluating the Effectiveness of Explainable AI for Adversarial Attack Detection in Traffic Sign Recognition Systems
Connected Autonomous Vehicles (CAVs) rely on deep neural network–based perception systems to operate safely in complex driving environments. However, these systems remain vulnerable to adversarial perturbations that can induce misclassification without perceptible changes to human observers. Explainable Artificial Intelligence (XAI) has been proposed as a means to improve transparency and potentially support adversarial detection by exposing inconsistencies in model attention. This study evaluates the effectiveness and limitations of an explanation-based adversarial detection approach using NoiseCAM on the German Traffic Sign Recognition Benchmark (GTSRB). Using Gaussian noise baseline, NoiseCAM was assessed as a binary adversarial detector across multiple perturbation strengths. Results indicate limited detection performance, with adversarial inputs identified in approximately 53% of cases, reflecting substantial overlap between adversarial and non-adversarial explanation-space responses. Detection effectiveness was further constrained by low image resolution, illumination variability, and limited signal-to-noise separation inherent to traffic sign imagery. These findings demonstrated that, while XAI methods such as NoiseCAM provide valuable insight into model behavior, explanation-space inconsistencies alone are insufficient as reliable adversarial detection signals in low-resolution, safety-critical perception pipelines. The study highlights the need for standardized evaluation frameworks and hybrid detection strategies that integrate explainability with complementary robustness and uncertainty measures. This study contributes empirical evidence clarifying the practical limits of XAI-based adversarial detection in CAV perception systems and informs the responsible deployment of explainable models in safety-critical applications.