Multimodal Industrial Scene Characterization for Pouring Process Monitoring Using a Mixture of Experts

Industrial pouring processes operate under highly dynamic conditions where small deviations can lead to defects, scrap, and production losses. Although modern foundries are equipped with multiple sensors and visual inspection systems, most monitoring approaches remain fragmented, unimodal, and difficult to interpret. Furthermore, annotated anomalous samples in industrial settings are scarce, hindering the development of traditional methods. As a result, many critical pouring anomalies are detected too late or lack sufficient contextual information for effective decision making.
In this work, we propose a multimodal framework for industrial scene characterization that unifies visual information and process signals through a Mixture of Experts (MoE) strategy. First, we deploy an ensemble of specialized modules that collaborate to identify regions of interest, assess pouring quality, and contextualize events within the production process, generating an interpretable description of pouring events. Second, we introduce a novel anomaly detection method for video multimodal data, combining a self-supervised transformer with an outlier-aware clustering algorithm. Our approach effectively identifies rare anomalies without requiring extensive manual labeling.
The resulting information is structured into a digital-twin-ready representation, enabling seamless synchronization between the physical system and its virtual counterpart. This solution provides a scalable, deployable pathway to transform heterogeneous industrial data into actionable knowledge, supporting advanced monitoring, anomaly detection, and quality control in real foundry environments.

Liked Liked