-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Torch, QNN] Support dynamic quantization flow to enable importing quantized transformer models #6782
Merged
Merged
[Torch, QNN] Support dynamic quantization flow to enable importing quantized transformer models #6782
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
2b753f8
add stub and test
e13037b
per channel quantize
e3bd310
calculate qparam correctly
9b27ea5
import qbert working
bada149
support batched qdense
c47e5f6
test batched input
2eda1d0
fix mkl offloading of batch matmul
789507d
reduce range become True in torch 1.6
e92e02a
fix for 1.6
1c40889
Revert "fix mkl offloading of batch matmul"
1fd5b42
fix merge
44a30cb
fix
6de0c4a
lint fix
masahi 12447d7
fix black
masahi 7858c80
more black fix
masahi bc78179
fix version check for 1.5.1
masahi f05fac6
disable assert on v1.4 (strange pytorch issue)
masahi b9f1eb4
minor fix
masahi 246b11f
use dequantize
masahi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats happening here? Should this be coupled with dtype - 127 should be for int8 while 255 for uint8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from https://github.com/pytorch/pytorch/blob/d642992877139671466d2a96663abede9e39ad55/aten/src/ATen/native/quantized/cpu/quant_utils.h#L64-L66
Here, they intentionally reduce the possible range of quantized values by half, i.e. [qmin, qmax] to [qmin/2, qmax/2]. Since PyTorch only uses uint8, this is fine.
It's not clear to me why they do this, but the following PR has some explanation: "reduce_range option restricts the activation tensor to 7 bits instead of 8.This is necessary to enable per channel quant for RNNs and LSTMs" pytorch/pytorch#39041