-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Change default rounding to UPWARD. #4131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the thorough evaluation @anijain2305 ! Interesting results in terms of accuracy/performance tradeoffs. Let's just check with @ZihengJiang whether this would affect the quantization nightly runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive! Thanks for the solid data! It's very interesting that the rounding policy impacts performance so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks great!
Thanks everyone. This is now merged. |
@tmoreau89 Sorry. I didn't see your comment above. Should I revert? |
This is QNN stuff. So, it would not affect automatic quantization nightly runs. So, this should be fine. |
@zhiics no problem, no need to revert |
* master: (51 commits) [QNN][TFLite] Parsing QNN Add op. Adding MobilenetV2. (apache#4142) [CI] Pin NNPack pthreadtools version (apache#4152) Fix typo (apache#4144) [Relay][Frontend][TF] Add tensor array ops (apache#3798) [relay][vm] Separate VM runtime with executable (apache#4100) [PATCH] Fix undefined __floatdihf in libtvmruntime.so on aarch64. (apache#4119) [DOCKER] Pin torchvision==0.4.1 (apache#4140) [TOPI][x86] Cascade lake support. (apache#4123) [Relay] Improve build error when no lowered funcs are produced (apache#4132) [RUNTIME] Refactor object python FFI to new protocol. (apache#4128) Update PULL_REQUEST_TEMPLATE.md Adding support for dequantizing from int32 to float32. (apache#4130) [Relay][Training] Add and fix gradients (apache#4126) [QNN] Change default rouning to UPWARD. (apache#4131) Fix infer type of kernel in dense. (apache#4125) [Relay][AlterOpLayout] NHWC to NCHWc pad operator. (apache#4103) [ARITH] Fix lowering of floormod(x, y) != 0 (apache#4127) [RFC][RUNTIME] Introduce new object protocol. (apache#4115) [Relay][Topi] Disable conv NHWC pack int8. (apache#4038) Update task_cpp_unittest.sh ...
This PR is to change the default rounding from TONEAREST to UPWARD. QNN ops are converted to a sequence of Relay operator while compilation. TONEAREST rounding leads to many more relay operators as shown here
https://github.com/dmlc/tvm/blob/11a3a777ec78ad81ce12a940d07521870f546ad1/src/relay/qnn/util.cc#L113-L124
This PR makes a case for the default rounding option to be UPWARD.
I ran all TFLite hosted Inception Quantized models on a Cascade Lake machine with these two rounding options. I am giving the comparison numbers that show that UPWARD rounding is overall better alternative.
Accuracy Comparison
Accuracy is same regardless of the rounding choice. Tested over 50k images with pre-processing.
Latency Comparison
Because TONEAREST leads to many more operators, the observed latency is higher.
Compilation Time Comparison
ToNearest has very high compilation time. Though not included, it blows up drastically for mobilenet V1.
Memory Footprint
Though I don't have numbers for that yet. @shoubhik also pointed out that memory footprint for the params increased with TONEAREST. This is mostly because TONEAREST introduces some more constants in the network, and they are in
int64
.Background
Background about QNN rounding choices
https://github.com/dmlc/tvm/blob/11a3a777ec78ad81ce12a940d07521870f546ad1/include/tvm/relay/qnn/attrs.h#L52-L61
The point is that TONEAREST and UPWARD differ at only negative mid values (like -1.5, -2.5 and so on). This difference is invisible in end-to-end accuracy.
What does other frameworks do?
Every toolchain has their own favorite rounding choice. There is no widely chosen in-production standard AFAIK.
Note that this PR is only changing the default value, one can still choose any rounding option if one wants. This is for TVM user, who does not want to worry about different roundings.
@tqchen @jackwish @vinx13 @FrozenGene @u99127 @zhiics @yzhliu