Arabic ASR model struggling to converge during training [D]

i’m trying to train an ASR model using the LibriSpeech recipe from SpeechBrain (without the language model) on a 100-hour dataset of dialectal Arabic speech. the model architecture uses a Conformer-small encoder and a Transformer decoder, with a total of around 13M parameters.
the recipe uses a combination of two loss functions: CTC and KL divergence, specifically: 0.3 * CTC + 0.7 * KLDiv
during training, both losses drop significantly during the first few weight updates, but then quickly plateau. the CTC loss gets stuck fluctuating around the 60-80 range, while the KL divergence loss remains around the 60s as well for the rest of training. as a result, the model does not converge properly, and the validation WER stays close to 100%.
i’ve already tried several things: adjusting the learning rate, changing the number of warmup steps, modifying the number of epochs, tuning the batch size and reducing the vocabulary size from the default 5000 to 1000.
none of these changes seem to help.
the training dataset is not publicly available and is weakly labeled. the validation and test sets come from the MGB2 dataset.
at this point, i genuinely don’t know what the root cause might be. i’ve experimented with many different approaches, but the model still refuses to converge. has anyone encountered a similar issue where their model gets stuck early in training and never improves? if so, what ended up being the cause or solution?
any feedback, suggestions, or ideas would be greatly appreciated.

submitted by /u/Sweet-Hamster-4991
[link] [comments]

Liked Liked