QDQ tool modification part2 #9720

chilo-ms · 2021-11-10T17:57:49Z

This PR added three features:

The default behavior is inserting a QDQ pair to one tensor and multiple nodes can share a QDQ pair as their inputs if needed.
In TRT, QDQ pair can’t be shared between nodes, so it will create dedicated QDQ pairs for each node.
Add a new extra options DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs. If True, it will create identical and dedicated QDQ pair for each node.
Add a new extra options OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support
Add a new extra options QDQChannelAxis = Integer. Default is 0. Channel axis for QDQ pair when per_channel is True.

chilo-ms · 2021-11-10T17:58:19Z

I'm adding unit tests and will add them later.

chilo-ms · 2021-11-10T21:25:27Z

I'm adding unit tests and will add them later.

unit tests added

yufenglee · 2021-11-17T01:17:20Z

onnxruntime/python/tools/quantization/operators/qdq_base_operator.py

@@ -19,4 +19,7 @@ def quantize(self):
            nodes_to_iterate = itertools.chain(node.input, node.output)

        for tensor_name in nodes_to_iterate:
-            self.quantizer.quantize_tensor(tensor_name)
+            if self.quantizer.is_per_channel():
+                self.quantizer.quantize_tensor_per_channel(tensor_name, self.quantizer.qdq_channel_axis)


what if the operator doesn't support per-channel

cpu ep only per-channel for Conv and Matmul now

Added a new extra options OpTypesSupportPerChannelQuantization

yufenglee · 2021-11-17T01:23:27Z

onnxruntime/python/tools/quantization/quantize.py

+            DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs.
+                                            If True, it will create identical and dedicated QDQ pair for each node.  
+            AddQDQToAddNodeFollowedByReduceMeanNode = True/False : Default is False. It adds QDQ pairs to every Add node if Add op type is in op_types_to_quantize.
+                                                                   If True, only Add node followed by ReduceMean node is going to be added QDQ pair. 


This is too specific.

we have a general way nodes_to_exclude. why not use it

Actually I'm using nodes_to_exclude in qdq_quantizer.py#128.
Agree it's too specific. I'm removing this feature and move the logic to e2e example to get nodes_to_exclude.

yufenglee · 2021-11-18T17:45:39Z

onnxruntime/python/tools/quantization/quantize.py

+            DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs.
+                                            If True, it will create identical and dedicated QDQ pair for each node.  
+            OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support.
+            QDQChannelAxis = Integer : Default is 0. Channel axis for QDQ pair when per_channel is True.


QDQChannelAxis

i don't think all the operators can share same per-channel axis.

which operators in TRT supports per-channel?

In TRT pretty much we only use per-channel for weights. All activations are using per-tensor.

yufenglee · 2021-11-18T17:50:08Z

onnxruntime/python/tools/quantization/quantize.py

@@ -194,6 +194,10 @@ def quantize_static(model_input,
                                              inserts both QuantizeLinear/DeQuantizeLinear nodes to weight.
            OpTypesToExcludeOutputQuantizatioin = list of op type : Default is []. If any op type is specified, it won't quantize  
                                                                    the output of ops with this specific op types.


as TRT has so many , i'm thinking we may need a config file for different execution provider

…r internal use Will have a follow-up PR to fine tune the code

stevenlix · 2021-11-30T00:10:05Z

Talked to @yufenglee offline. Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for TRT internal use for now. Will have a followup PR to fine tune the related code.

yufenglee · 2021-11-30T02:48:49Z

onnxruntime/python/tools/quantization/operators/qdq_base_operator.py

+                    logging.warning(
+                        "{} doesn't support per channel quantization. Quantize tensor: {} with per-tensor instead.".format(
+                            node.op_type, tensor_name))
+                    self.quantizer.quantize_tensor(tensor_name)


it should not be a warning. It is normal that an operator doesn't support per-channel.

* Add finetuned qdq options * Add description * Add unit tests * Modify for channel axis * Remove too specific feature. Move this implementation to e2e example * Add OpTypesSupportPerChannelQuantization * fix bug for unit test * Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use Will have a follow-up PR to fine tune the code * remove unnecessary warning Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> (cherry picked from commit 0baf687)

* Fix memset size (#9840) (cherry picked from commit d012d9f) * [js/web] do not use nodejs type 'Buffer' in web (#9839) * [js/web] do not use nodejs type 'Buffer' in web * resolve comments and validate tests * remove 'Buffer' in test (cherry picked from commit a3ebc5e) * Fix potential data race with OrtValue usage in Python (#9841) (cherry picked from commit 18fd2cf) * [OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848) * Changes to ensure openvino build go through in Windows * Modified Hetero plugin Logic *Modified Hetero Feature logic. In Hetero, if the operator to be marked true in getcapability(), it should be supported by either of the devices specified with HETERO in the device_type. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV updated to 2021.4.2 version * OV updated to 2021.4.2 version * Updated OV to 2021.4.2 version, mono download link and dotnet version * Copying Managed nugets in openvino c# docker file *Copying Managed nuget to nugets artifacts directory Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: saharfraza <sfatima.3001@gmail.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com> (cherry picked from commit 0ae0f29) * no fallback when enforcing explicit EP registration. (#9863) * no fallback when enforcing explicit EP registration. * add explicit ep registrations for python. (cherry picked from commit 1e9e57d) * layernorm throw error if input has no data (#9837) (cherry picked from commit bf716e6) * [js/node] npm audit fix (#9861) (cherry picked from commit 27e337e) * [python manylinux package] emit warning if missing CUDA/TensorRT dependency causes ld_preload to fail and user tries to register either CUDA/TensorRT EP (#9872) * add warning if ld_preload fails for CUDA or TRT when trying to register either provider * refactor * change wording from register to create (cherry picked from commit ec9b0ed) * QDQ tool modification part2 (#9720) * Add finetuned qdq options * Add description * Add unit tests * Modify for channel axis * Remove too specific feature. Move this implementation to e2e example * Add OpTypesSupportPerChannelQuantization * fix bug for unit test * Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use Will have a follow-up PR to fine tune the code * remove unnecessary warning Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> (cherry picked from commit 0baf687) * Cancel transpose optimizer for resize (#9870) * cancel transpose optimizer for resize * add UT * addressing comments * fix build err (cherry picked from commit 16bfd3c) * Add build option to enable cuda profiling (#9875) (cherry picked from commit 9345894) Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Maajid khan <n.maajidkhan@gmail.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>

chilo-ms added 2 commits November 9, 2021 20:43

Add finetuned qdq options

c859447

Add description

3b92da7

chilo-ms requested review from yufenglee, stevenlix and jywu-msft November 10, 2021 17:57

Add unit tests

ac5405b

Modify for channel axis

b30daea

yufenglee reviewed Nov 17, 2021

View reviewed changes

chilo-ms added 3 commits November 18, 2021 05:39

Remove too specific feature. Move this implementation to e2e example

0ba3c58

Add OpTypesSupportPerChannelQuantization

5f04fbe

fix bug for unit test

fa51f26

yufenglee reviewed Nov 18, 2021

View reviewed changes

stevenlix previously approved these changes Nov 23, 2021

View reviewed changes

stevenlix added the release:1.10 label Nov 23, 2021

faxu added the triage:approved label Nov 29, 2021

Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis fo…

ca35bf1

…r internal use Will have a follow-up PR to fine tune the code

stevenlix dismissed their stale review via ca35bf1 November 30, 2021 00:06

yufenglee reviewed Nov 30, 2021

View reviewed changes

remove unnecessary warning

87ca68c

yufenglee approved these changes Nov 30, 2021

View reviewed changes

faxu merged commit 0baf687 into master Nov 30, 2021

faxu deleted the qdq_improvement branch November 30, 2021 05:45

jingyanwangms mentioned this pull request Nov 30, 2021

Release 1.10.0 cherry pick round 1 #9886

Merged

jingyanwangms removed the release:1.10 label Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QDQ tool modification part2 #9720

QDQ tool modification part2 #9720

chilo-ms commented Nov 10, 2021 •

edited

Loading

chilo-ms commented Nov 10, 2021

chilo-ms commented Nov 10, 2021

yufenglee Nov 17, 2021

yufenglee Nov 17, 2021

chilo-ms Nov 18, 2021 •

edited

Loading

yufenglee Nov 17, 2021

yufenglee Nov 17, 2021

chilo-ms Nov 18, 2021

yufenglee Nov 18, 2021

yufenglee Nov 18, 2021

stevenlix Nov 18, 2021

yufenglee Nov 18, 2021

stevenlix commented Nov 30, 2021

yufenglee Nov 30, 2021 •

edited

Loading

QDQ tool modification part2 #9720

QDQ tool modification part2 #9720

Conversation

chilo-ms commented Nov 10, 2021 • edited Loading

chilo-ms commented Nov 10, 2021

chilo-ms commented Nov 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chilo-ms Nov 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenlix commented Nov 30, 2021

yufenglee Nov 30, 2021 • edited Loading

Choose a reason for hiding this comment

chilo-ms commented Nov 10, 2021 •

edited

Loading

chilo-ms Nov 18, 2021 •

edited

Loading

yufenglee Nov 30, 2021 •

edited

Loading