A Wavelet-Enhanced Data-Augmented Network for Robust Test Tube Detection in Clinical Workflows

Reliable test-tube detection on clinical conveyor lines remains difficult when tubes are densely packed, placed irregularly, weakly illuminated, partially blurred by robot vibration, and contaminated by glare from glass or PET surfaces. These conditions erode the short-axis boundary cues and faint graduation marks that slender tubes depend on. We therefore build WDA-TNET on YOLOv11 and target the failure modes at four points of the pipeline. First, WGSR restores blurred regions selectively based on wavelet energy, avoiding over-sharpening specular areas. Second, GSCIM suppresses glare-dominated channel responses in the backbone through direction-aware pooling and cross-channel interaction, retaining weak structural cues like liquid-level edges. Third, DCPAF separates height and width encoding in the neck, dynamically balancing long-axis context and short-axis localization suitable for elongated targets. Finally, ATSS and MPDIoU stabilize supervision when positives are sparse and boxes overlap only weakly. We evaluated our model on the newly constructed Complex Test Tube (CTT) dataset containing 11,955 images and 81,044 instances. WDA-TNET achieves 94.1% precision and 79.1% mAP50:95, improving mAP50:95 by 3.6 percentage points over YOLOv11. On the transparent-container HeinSight4 dataset, the model attains 95.2% mAP50:95, proving robust cross-domain generalization.

Liked Liked