[D] got tired of “just vibes” testing for edge ML models, so I built automated quality gates

digitado ⋅ 1 de March de 2026

so about 6 months ago I was messing around with a vision model on a Snapdragon device as a side project. worked great on my laptop. deployed to actual hardware and latency had randomly jumped 40% after a tiny preprocessing change.

the kicker? I only caught it because I was obsessively re-running benchmarks between changes. if I hadn’t been that paranoid, it would’ve just shipped broken.

and that’s basically the state of ML deployment to edge devices right now. we’ve got CI/CD for code — linting, unit tests, staging, the whole nine yards. for models going to phones/robots/cameras? you quantize, squint at some outputs, maybe run a notebook, and pray lol.

so I started building automated gates that test on real Snapdragon hardware through Qualcomm AI Hub. not simulators, actual device runs.

ran our FP32 model on Snapdragon 8 Gen 3 (Galaxy S24) — 0.176ms inference, 121MB memory. INT8 version came in at 0.187ms and 124MB. both passed gates no problem. then threw ResNet50 at it — 1.403ms inference, 236MB memory. both gates failed instantly. that’s the kind of stuff that would’ve slipped through with manual testing.

also added signed evidence bundles (Ed25519 + SHA-256) because “the ML team said it looked good” shouldn’t be how we ship models in 2026 lmao.

still super early but the core loop works. anyone else shipping to mobile/embedded dealing with this? what does your testing setup look like? genuinely curious because most teams I’ve talked to are basically winging it.

submitted by /u/NoAdministration6906
[link] [comments]

Like 0

Liked Liked