A Collaborative Multi-Compression Acceleration Mechanism for Neural Networks in Keyword Spotting
To address the large model size, high computational cost, and limited deployment re-sources of keyword spotting models on edge platforms, this study proposes a collaborative multi-compression acceleration framework for lightweight deployment. Built on an end-to-end convolutional neural network for keyword spotting, the framework integrates adaptive structured pruning, hardware-friendly mixed-precision dynamic quantization, and quantization-aware multi-stage knowledge distillation into a unified compression pipeline. To eliminate the influence of inconsistent training budgets and data partitions across different compression branches, the results of quantization, pruning, distillation, and joint compression are reorganized under a unified evaluation protocol with mul-ti-seed mean ± std reporting. Under this protocol, the retrained baseline reaches 97.13% ± 0.85. Experimental results show that, in the quantization branch, MPDQ achieves 95.78% ± 1.69 with a compression ratio of 9.56×, demonstrating the most favorable balance be-tween accuracy and storage efficiency; in the pruning branch, AIASP reaches 95.63% ± 0.67 at 30% sparsity with a compression ratio of 1.43×, indicating a balanced compromise between accuracy retention and stability; in the distillation branch, PMKD, Multi-Teacher KD, and Fixed-T KD achieve 96.81% ± 0.69, 95.99% ± 1.18, and 96.70% ± 0.74, respectively, showing that the student model can maintain strong recognition performance under ap-proximately 4× structural compression; and the final joint compression scheme reaches 96.16% ± 0.53 with a trade-off score of 4.26 at a compression ratio of 9.89×. These results indicate that the main advantage of collaborative multi-compression lies in achieving a more balanced optimization among accuracy, model size, and compression efficiency un-der stringent deployment constraints.