easyaligner: Forced alignment with GPU acceleration and flexible text normalization (compatible with all w2v2 models on HF Hub) [P]
https://preview.redd.it/f4d5krhkjyvg1.png?width=1020&format=png&auto=webp&s=11310f377b22abbe3dd110cc7d362ba8aae35f8d I have built easyaligner, a forced alignment library designed to be performant and easy to use. Having worked with preprocessing hundreds of thousands of hours of audio and text for training speech-to-text models, I found that the available open source forced alignment libraries often missed some convenience features. For our purposes it was, in particular, important for the tooling to be able to: Handle cases where the transcript does not cover all of the spoken content in […]