This ASR Actually Handles 52 Languages
Author(s): Gowtham Boyina Originally published on Towards AI. And the Forced Alignment Model Is the Interesting Part I’ve tested dozens of speech recognition models over the time. Most claim multilingual support but quietly fall apart when you give them actual Chinese dialects, accented English, or anything beyond standard broadcast audio. The ones that do work well are usually proprietary APIs with pricing that scales uncomfortably. from Qwen-ASR githubAlibaba’s Qwen team has introduced Qwen3-ASR, an open-source speech recognition system supporting 52 languages and dialects. Key models include Qwen3-ASR-1.7B, which boasts state-of-the-art performance for multilingual tasks, and Qwen3-ForcedAligner-0.6B, a non-autoregressive model for accurate speech-text alignment. These developments allow for better handling of Chinese dialects, user-generated content in multiple languages, and enhanced timestamp accuracy for applications needing precise audio-text synchronization. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI