Real-Time Sports Action Recognition Using a CNN–Transformer Hybrid Deep Learning Framework

The rapid expansion of sports broadcasting and digital media platforms has increased the demand for intelligent systems capable of automatically identifying important sports events for real-time analytics and highlight generation. Manual annotation of sports videos requires significant time and effort and may introduce human errors during analysis. This paper presents a real-time sports action recognition framework using a hybrid CNN–Transformer architecture for detecting critical events in football and cricket videos. The proposed system processes live or recorded video streams through frame extraction, normalization, and spatial feature learning using the MobileNetV2 network. Temporal relationships between consecutive frames are modeled using a Transformer encoder to improve action understanding. The framework classifies events such as pass and goal in football, and four, six, and wicket in cricket. Motion-based filtering and confidence thresholding reduce non-action frames and improve prediction reliability. Detected events are recorded with timestamps and displayed using broadcast-style overlays to support automated highlight generation. Experimental evaluation demonstrates high recognition accuracy and efficient real-time performance on low-cost hardware platforms. The framework provides an effective solution for sports analytics, media automation, and intelligent decision-support systems.

Liked Liked