[D] got tired of “just vibes” testing for edge ML models, so I built automated quality gates
so about 6 months ago I was messing around with a vision model on a Snapdragon device as a side project. worked great on my laptop. deployed to actual hardware and latency had randomly jumped 40% after a tiny preprocessing change.
the kicker? I only caught it because I was obsessively re-running benchmarks between changes. if I hadn’t been that paranoid, it would’ve just shipped broken.
and that’s basically the state of ML deployment to edge devices right now. we’ve got CI/CD for code — linting, unit tests, staging, the whole nine yards. for models going to phones/robots/cameras? you quantize, squint at some outputs, maybe run a notebook, and pray lol.
so I started building automated gates that test on real Snapdragon hardware through Qualcomm AI Hub. not simulators, actual device runs.
ran our FP32 model on Snapdragon 8 Gen 3 (Galaxy S24) — 0.176ms inference, 121MB memory. INT8 version came in at 0.187ms and 124MB. both passed gates no problem. then threw ResNet50 at it — 1.403ms inference, 236MB memory. both gates failed instantly. that’s the kind of stuff that would’ve slipped through with manual testing.
also added signed evidence bundles (Ed25519 + SHA-256) because “the ML team said it looked good” shouldn’t be how we ship models in 2026 lmao.
still super early but the core loop works. anyone else shipping to mobile/embedded dealing with this? what does your testing setup look like? genuinely curious because most teams I’ve talked to are basically winging it.
submitted by /u/NoAdministration6906
[link] [comments]