[P] Utterance, an open source client-side semantic endpointing SDK for voice apps. We are looking for contributors.
Hey everyone,
I’ve been really frustrated with how every voice app handles pauses. You stop to think for a second, and the AI cuts you off. You want to interrupt, and it keeps talking. The problem is that tools like Silero VAD only detect sound and silence. They don’t recognize whether you’re thinking or have really finished speaking.
Server-side solutions like OpenAI Realtime and AssemblyAI do this well, but they add latency, cost, and privacy issues. No one has created a lightweight client-side model that understands conversational intent locally on the device.
I’m building Utterance, an open-source SDK (MIT-licensed) that runs a small ML model (about 3-5MB, ONNX) entirely in the browser or on the device. It detects four states: speaking, thinking pause, turn complete, and interrupt intent. There’s no cloud, no API keys, and no per-minute pricing.
The repo is live at github.com/nizh0/Utterance, and the website is utterance.dev.
Right now, I’m looking for contributors in these areas:
- ML / Audio — model architecture, training pipeline, feature extraction
- JavaScript / TypeScript — Web Audio API, ONNX Runtime integration
- Python — PyAudio integration, package distribution
- Docs & Testing — guides, tutorials, real-world conversation testing
If you’ve ever been annoyed by a voice app cutting you off mid-thought, this is the project to solve that. I would love to have you involved.
submitted by /u/R3VNUE
[link] [comments]