[D] How should I fine-tune an ASR model for multilingual IPA transcription?
Hi everyone!
I’m working on a project where I want to build an ASR system that transcribes audio into IPA, based on what was actually said. The dataset is multilingual.
Here’s what I currently have:
– 36 audio files with clear pronunciation + IPA
– 100 audio files from random speakers with background noise + IPA annotations
My goal is to train an ASR model that can take new audio and output IPA transcription.
I’d love advice on two main things:
-
What model should I start with?
-
How should I fine-tune it?
Thank you.
submitted by /u/Routine-Ticket-5208
[link] [comments]
Like
0
Liked
Liked