[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

masahi · 2020-03-13T04:12:10Z

Previously we were falling back to fp32 op for add/mul/concatenate, because the accuracy on mobilenet v2 would drop if we use QNN's add for torch quantized::add, and also that is the way Torch internally implements some of quantized ops currently.

But I found that the accuracy loss was due to a different reason (turned our for mobilenet v2 only, torchvision people trained it with quantization aware training, and I was doing post training calibration on top of it). Now that the accuracy loss was fixed in a proper way, we don't need to piggy back to fp32 ops like Torch does. No loss of accuracy after this change.

please review @anijain2305
cc @jwfromm @jjohnson-arm

anijain2305

LGTM

python/tvm/relay/frontend/qnn_torch.py

masahi · 2020-03-13T20:41:47Z

@anijain2305 @jwfromm @jjohnson-arm

Here is the current result on mobilenet v2, using QNN add and post training calibration (which is wrong).

Model name: mobilenet_v2, per channel quantization
PyTorch accuracy: Top1 = 67.87, Top5 = 88.15
TVM accuracy: Top1 = 62.47, Top5 = 84.67
PyTorch top5 label: [101 386  51 385  69]
TVM top5 label: [101 386  51 385 340]
PyTorch top5 raw output: [18.233843 16.314491 15.674707 13.115572 12.795679]
TVM top5 raw output: [27.510712 26.231144 21.752655 20.153194 17.274168]
max abs diff: 9.916653
mean abs_diff: 2.0649028
50 in 1000 raw outputs correct.

We lost 5 point accuracy compared to Torch.

And here is without post training calibration, also using QNN add. Now the top1 accuracy is much better and almost the same as Torch. Moreover, the raw output of the network, 1000 floating point values, are much closer to Torch. The former has only 50 out of 1000 outputs identical, while the latter, correct one has 376/1000.

Model name: mobilenet_v2, per channel quantization
PyTorch accuracy: Top1 = 71.32, Top5 = 89.86
TVM accuracy: Top1 = 71.27, Top5 = 89.86
PyTorch top5 label: [101 386 385  51 340]
TVM top5 label: [101 386 385  51 340]
PyTorch top5 raw output: [20.168097 18.80845  17.222195 13.59647   9.290921]
TVM top5 raw output: [19.941488 18.581842 16.995586 13.823077  9.064313]
max abs diff: 0.9064312
mean abs_diff: 0.17562106
376 in 1000 raw outputs correct.

masahi · 2020-03-13T22:11:50Z

Thanks @anijain2305

…pache#5061) * use qnn add/mul/concatenate * remove logging

use qnn add/mul/concatenate

a440126

anijain2305 approved these changes Mar 13, 2020

View reviewed changes

python/tvm/relay/frontend/qnn_torch.py Outdated Show resolved Hide resolved

remove logging

a1685a6

masahi merged commit 4fbc2fb into apache:master Mar 13, 2020

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate (a…

c334a2a

…pache#5061) * use qnn add/mul/concatenate * remove logging

zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate (a…

c59ee40

…pache#5061) * use qnn add/mul/concatenate * remove logging

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

masahi commented Mar 13, 2020 •

edited

Loading

anijain2305 left a comment

masahi commented Mar 13, 2020 •

edited

Loading

masahi commented Mar 13, 2020

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

Conversation

masahi commented Mar 13, 2020 • edited Loading

anijain2305 left a comment

Choose a reason for hiding this comment

masahi commented Mar 13, 2020 • edited Loading

masahi commented Mar 13, 2020

masahi commented Mar 13, 2020 •

edited

Loading

masahi commented Mar 13, 2020 •

edited

Loading