Arabic ASR model struggling to converge during training [D]
i’m trying to train an ASR model using the LibriSpeech recipe from SpeechBrain (without the language model) on a 100-hour dataset of dialectal Arabic speech. the model architecture uses a Conformer-small encoder and a Transformer decoder, with a total of around 13M parameters. the recipe uses a combination of two loss functions: CTC and KL divergence, specifically: 0.3 * CTC + 0.7 * KLDiv during training, both losses drop significantly during the first few weight updates, but then […]