Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN] Change default rounding to UPWARD. #4131

Merged
merged 1 commit into from
Oct 16, 2019
Merged

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Oct 15, 2019

This PR is to change the default rounding from TONEAREST to UPWARD. QNN ops are converted to a sequence of Relay operator while compilation. TONEAREST rounding leads to many more relay operators as shown here

https://github.com/dmlc/tvm/blob/11a3a777ec78ad81ce12a940d07521870f546ad1/src/relay/qnn/util.cc#L113-L124

This PR makes a case for the default rounding option to be UPWARD.

I ran all TFLite hosted Inception Quantized models on a Cascade Lake machine with these two rounding options. I am giving the comparison numbers that show that UPWARD rounding is overall better alternative.


Accuracy Comparison

Accuracy is same regardless of the rounding choice. Tested over 50k images with pre-processing.

Model Upward Tonearest
Inception V1 69.592/89.456 69.592/89.456
Inception V2 73.332/91.328 73.332/91.328
Inception V3 77.306/93.618 77.306/93.618
Inception V4 79.55/94.196 79.55/94.196

Latency Comparison

Because TONEAREST leads to many more operators, the observed latency is higher.

Model Upward (ms) Tonearest (ms)
Inception V1 2.13497 2.97504
Inception V2 8.96444 11.89356
Inception V3 6.11635 9.13942
Inception V4 12.24914 16.80568

Compilation Time Comparison

ToNearest has very high compilation time. Though not included, it blows up drastically for mobilenet V1.

Model Upward (seconds) Tonearest (seconds)
Inception V1 80 113
Inception V2 53 55
Inception V3 67 80
Inception V4 64 75

Memory Footprint

Though I don't have numbers for that yet. @shoubhik also pointed out that memory footprint for the params increased with TONEAREST. This is mostly because TONEAREST introduces some more constants in the network, and they are in int64.


Background

Background about QNN rounding choices
https://github.com/dmlc/tvm/blob/11a3a777ec78ad81ce12a940d07521870f546ad1/include/tvm/relay/qnn/attrs.h#L52-L61

The point is that TONEAREST and UPWARD differ at only negative mid values (like -1.5, -2.5 and so on). This difference is invisible in end-to-end accuracy.


What does other frameworks do?

Every toolchain has their own favorite rounding choice. There is no widely chosen in-production standard AFAIK.

  • TFLite - Chooses ARM based rounding. It works back from ARM assembly instructions, and very close to TONEAREST, but not exactly TONEAREST.
  • MKLDNN - RoundNearestEven as per MKLDNN doc. This is not TONEAREST rounding.

Note that this PR is only changing the default value, one can still choose any rounding option if one wants. This is for TVM user, who does not want to worry about different roundings.

@tqchen @jackwish @vinx13 @FrozenGene @u99127 @zhiics @yzhliu

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the thorough evaluation @anijain2305 ! Interesting results in terms of accuracy/performance tradeoffs. Let's just check with @ZihengJiang whether this would affect the quantization nightly runs.

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive! Thanks for the solid data! It's very interesting that the rounding policy impacts performance so much.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks great!

@zhiics zhiics merged commit 1c0e743 into apache:master Oct 16, 2019
@zhiics
Copy link
Member

zhiics commented Oct 16, 2019

Thanks everyone. This is now merged.

@zhiics
Copy link
Member

zhiics commented Oct 16, 2019

@tmoreau89 Sorry. I didn't see your comment above. Should I revert?

@anijain2305
Copy link
Contributor Author

anijain2305 commented Oct 16, 2019

This is QNN stuff. So, it would not affect automatic quantization nightly runs. So, this should be fine.

@tmoreau89
Copy link
Contributor

@zhiics no problem, no need to revert

anijain2305 added a commit to anijain2305/tvm that referenced this pull request Oct 17, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 18, 2019
petrex added a commit to petrex/tvm that referenced this pull request Oct 29, 2019
* master: (51 commits)
  [QNN][TFLite] Parsing QNN Add op. Adding MobilenetV2. (apache#4142)
  [CI] Pin NNPack pthreadtools version (apache#4152)
  Fix typo (apache#4144)
  [Relay][Frontend][TF] Add tensor array ops (apache#3798)
  [relay][vm] Separate VM runtime with executable (apache#4100)
  [PATCH] Fix undefined __floatdihf in libtvmruntime.so on aarch64. (apache#4119)
  [DOCKER] Pin torchvision==0.4.1 (apache#4140)
  [TOPI][x86] Cascade lake support. (apache#4123)
  [Relay] Improve build error when no lowered funcs are produced (apache#4132)
  [RUNTIME] Refactor object python FFI to new protocol. (apache#4128)
  Update PULL_REQUEST_TEMPLATE.md
  Adding support for dequantizing from int32 to float32. (apache#4130)
  [Relay][Training] Add and fix gradients (apache#4126)
  [QNN] Change default rouning to UPWARD. (apache#4131)
  Fix infer type of kernel in dense. (apache#4125)
  [Relay][AlterOpLayout] NHWC to NCHWc pad operator. (apache#4103)
  [ARITH] Fix lowering of floormod(x, y) != 0 (apache#4127)
  [RFC][RUNTIME] Introduce new object protocol. (apache#4115)
  [Relay][Topi] Disable conv NHWC pack int8. (apache#4038)
  Update task_cpp_unittest.sh
  ...
@anijain2305 anijain2305 deleted the rounding branch November 13, 2019 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants