-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QDQ tool modification part2 #9720
Conversation
I'm adding unit tests and will add them later. |
unit tests added |
@@ -19,4 +19,7 @@ def quantize(self): | |||
nodes_to_iterate = itertools.chain(node.input, node.output) | |||
|
|||
for tensor_name in nodes_to_iterate: | |||
self.quantizer.quantize_tensor(tensor_name) | |||
if self.quantizer.is_per_channel(): | |||
self.quantizer.quantize_tensor_per_channel(tensor_name, self.quantizer.qdq_channel_axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the operator doesn't support per-channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu ep only per-channel for Conv and Matmul now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new extra options OpTypesSupportPerChannelQuantization
DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs. | ||
If True, it will create identical and dedicated QDQ pair for each node. | ||
AddQDQToAddNodeFollowedByReduceMeanNode = True/False : Default is False. It adds QDQ pairs to every Add node if Add op type is in op_types_to_quantize. | ||
If True, only Add node followed by ReduceMean node is going to be added QDQ pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a general way nodes_to_exclude. why not use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm using nodes_to_exclude in qdq_quantizer.py#128.
Agree it's too specific. I'm removing this feature and move the logic to e2e example to get nodes_to_exclude.
DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs. | ||
If True, it will create identical and dedicated QDQ pair for each node. | ||
OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support. | ||
QDQChannelAxis = Integer : Default is 0. Channel axis for QDQ pair when per_channel is True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which operators in TRT supports per-channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TRT pretty much we only use per-channel for weights. All activations are using per-tensor.
@@ -194,6 +194,10 @@ def quantize_static(model_input, | |||
inserts both QuantizeLinear/DeQuantizeLinear nodes to weight. | |||
OpTypesToExcludeOutputQuantizatioin = list of op type : Default is []. If any op type is specified, it won't quantize | |||
the output of ops with this specific op types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as TRT has so many , i'm thinking we may need a config file for different execution provider
…r internal use Will have a follow-up PR to fine tune the code
Talked to @yufenglee offline. Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for TRT internal use for now. Will have a followup PR to fine tune the related code. |
logging.warning( | ||
"{} doesn't support per channel quantization. Quantize tensor: {} with per-tensor instead.".format( | ||
node.op_type, tensor_name)) | ||
self.quantizer.quantize_tensor(tensor_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should not be a warning. It is normal that an operator doesn't support per-channel.
* Add finetuned qdq options * Add description * Add unit tests * Modify for channel axis * Remove too specific feature. Move this implementation to e2e example * Add OpTypesSupportPerChannelQuantization * fix bug for unit test * Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use Will have a follow-up PR to fine tune the code * remove unnecessary warning Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> (cherry picked from commit 0baf687)
* Fix memset size (#9840) (cherry picked from commit d012d9f) * [js/web] do not use nodejs type 'Buffer' in web (#9839) * [js/web] do not use nodejs type 'Buffer' in web * resolve comments and validate tests * remove 'Buffer' in test (cherry picked from commit a3ebc5e) * Fix potential data race with OrtValue usage in Python (#9841) (cherry picked from commit 18fd2cf) * [OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848) * Changes to ensure openvino build go through in Windows * Modified Hetero plugin Logic *Modified Hetero Feature logic. In Hetero, if the operator to be marked true in getcapability(), it should be supported by either of the devices specified with HETERO in the device_type. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV updated to 2021.4.2 version * OV updated to 2021.4.2 version * Updated OV to 2021.4.2 version, mono download link and dotnet version * Copying Managed nugets in openvino c# docker file *Copying Managed nuget to nugets artifacts directory Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: saharfraza <sfatima.3001@gmail.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com> (cherry picked from commit 0ae0f29) * no fallback when enforcing explicit EP registration. (#9863) * no fallback when enforcing explicit EP registration. * add explicit ep registrations for python. (cherry picked from commit 1e9e57d) * layernorm throw error if input has no data (#9837) (cherry picked from commit bf716e6) * [js/node] npm audit fix (#9861) (cherry picked from commit 27e337e) * [python manylinux package] emit warning if missing CUDA/TensorRT dependency causes ld_preload to fail and user tries to register either CUDA/TensorRT EP (#9872) * add warning if ld_preload fails for CUDA or TRT when trying to register either provider * refactor * change wording from register to create (cherry picked from commit ec9b0ed) * QDQ tool modification part2 (#9720) * Add finetuned qdq options * Add description * Add unit tests * Modify for channel axis * Remove too specific feature. Move this implementation to e2e example * Add OpTypesSupportPerChannelQuantization * fix bug for unit test * Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use Will have a follow-up PR to fine tune the code * remove unnecessary warning Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> (cherry picked from commit 0baf687) * Cancel transpose optimizer for resize (#9870) * cancel transpose optimizer for resize * add UT * addressing comments * fix build err (cherry picked from commit 16bfd3c) * Add build option to enable cuda profiling (#9875) (cherry picked from commit 9345894) Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Maajid khan <n.maajidkhan@gmail.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
This PR added three features:
The default behavior is inserting a QDQ pair to one tensor and multiple nodes can share a QDQ pair as their inputs if needed.
In TRT, QDQ pair can’t be shared between nodes, so it will create dedicated QDQ pairs for each node.
Add a new extra options DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs. If True, it will create identical and dedicated QDQ pair for each node.
Add a new extra options OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support
Add a new extra options QDQChannelAxis = Integer. Default is 0. Channel axis for QDQ pair when per_channel is True.