[D] We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.

We’ve been doing on-device accuracy testing across multiple Snapdragon SoCs and the results have been eye-opening.

Same model. Same quantization. Same ONNX export. Deployed to 5 different chipsets:

Device Accuracy
Snapdragon 8 Gen 3 91.8%
Snapdragon 8 Gen 2 89.1%
Snapdragon 7s Gen 2 84.3%
Snapdragon 6 Gen 1 79.6%
Snapdragon 4 Gen 2 71.2%

Cloud benchmark reported 94.2%.

The spread comes down to three things we’ve observed:

  1. NPU precision handling — INT8 rounding behavior differs across Hexagon generations. Not all INT8 is created equal.
  2. Operator fusion differences — the QNN runtime optimizes the graph differently per SoC, sometimes trading accuracy for throughput.
  3. Memory-constrained fallback — on lower-tier chips, certain ops fall back from NPU to CPU, changing the execution path entirely.

None of this shows up in cloud-based benchmarks. You only see it when you run on real hardware.

Curious if others are seeing similar drift across chipsets — or if anyone has a good strategy for catching this before shipping. Most CI pipelines we’ve seen only test on cloud GPUs and call it a day.

submitted by /u/NoAdministration6906
[link] [comments]

Liked Liked