Towards Speech Technology for Garo: A Low-Resource ASR System via Multilingual Transfer
We present a fine-tuned Whisper model for automatic speech recognition (ASR) in Garo, a low-resource Tibeto-Burman language spoken in Northeast India. Using training samples from the Vaani dataset, we fine-tune Whisper-small and achieve a Word Error Rate (WER) of 9.74% and Character Error Rate (CER) of 3.82% on the test set, representing a 97.5% relative improvement over the zero-shot baseline. Our model produces perfect transcriptions for over 60% of test samples and achieves real-time inference speeds. We analyze error patterns including code-switching challenges and morphological complexities specific to Garo. The model is publicly released to support future research in low-resource speech recognition for Tibeto-Burman languages.