Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QDQ tool modification part2 #9720

Merged
merged 9 commits into from
Nov 30, 2021
Merged

QDQ tool modification part2 #9720

merged 9 commits into from
Nov 30, 2021

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Nov 10, 2021

This PR added three features:

  1. The default behavior is inserting a QDQ pair to one tensor and multiple nodes can share a QDQ pair as their inputs if needed.
    In TRT, QDQ pair can’t be shared between nodes, so it will create dedicated QDQ pairs for each node.
    Add a new extra options DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs. If True, it will create identical and dedicated QDQ pair for each node.

  2. Add a new extra options OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support

  3. Add a new extra options QDQChannelAxis = Integer. Default is 0. Channel axis for QDQ pair when per_channel is True.

@chilo-ms
Copy link
Contributor Author

I'm adding unit tests and will add them later.

@chilo-ms
Copy link
Contributor Author

I'm adding unit tests and will add them later.

unit tests added

@@ -19,4 +19,7 @@ def quantize(self):
nodes_to_iterate = itertools.chain(node.input, node.output)

for tensor_name in nodes_to_iterate:
self.quantizer.quantize_tensor(tensor_name)
if self.quantizer.is_per_channel():
self.quantizer.quantize_tensor_per_channel(tensor_name, self.quantizer.qdq_channel_axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the operator doesn't support per-channel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu ep only per-channel for Conv and Matmul now

Copy link
Contributor Author

@chilo-ms chilo-ms Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new extra options OpTypesSupportPerChannelQuantization

DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs.
If True, it will create identical and dedicated QDQ pair for each node.
AddQDQToAddNodeFollowedByReduceMeanNode = True/False : Default is False. It adds QDQ pairs to every Add node if Add op type is in op_types_to_quantize.
If True, only Add node followed by ReduceMean node is going to be added QDQ pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a general way nodes_to_exclude. why not use it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm using nodes_to_exclude in qdq_quantizer.py#128.
Agree it's too specific. I'm removing this feature and move the logic to e2e example to get nodes_to_exclude.

DedicatedQDQPair = True/False : Default is False. When inserting QDQ pair, multiple nodes can share a single QDQ pair as their inputs.
If True, it will create identical and dedicated QDQ pair for each node.
OpTypesSupportPerChannelQuantization = list of op type : Default is []. List of op types that has per channel quantization support.
QDQChannelAxis = Integer : Default is 0. Channel axis for QDQ pair when per_channel is True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QDQChannelAxis

i don't think all the operators can share same per-channel axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which operators in TRT supports per-channel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TRT pretty much we only use per-channel for weights. All activations are using per-tensor.

@@ -194,6 +194,10 @@ def quantize_static(model_input,
inserts both QuantizeLinear/DeQuantizeLinear nodes to weight.
OpTypesToExcludeOutputQuantizatioin = list of op type : Default is []. If any op type is specified, it won't quantize
the output of ops with this specific op types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as TRT has so many , i'm thinking we may need a config file for different execution provider

stevenlix
stevenlix previously approved these changes Nov 23, 2021
…r internal use

Will have a follow-up PR to fine tune the code
@stevenlix
Copy link
Contributor

Talked to @yufenglee offline. Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for TRT internal use for now. Will have a followup PR to fine tune the related code.

logging.warning(
"{} doesn't support per channel quantization. Quantize tensor: {} with per-tensor instead.".format(
node.op_type, tensor_name))
self.quantizer.quantize_tensor(tensor_name)
Copy link
Member

@yufenglee yufenglee Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not be a warning. It is normal that an operator doesn't support per-channel.

@faxu faxu merged commit 0baf687 into master Nov 30, 2021
@faxu faxu deleted the qdq_improvement branch November 30, 2021 05:45
jingyanwangms pushed a commit that referenced this pull request Nov 30, 2021
* Add finetuned qdq options

* Add description

* Add unit tests

* Modify for channel axis

* Remove too specific feature. Move this implementation to e2e example

* Add OpTypesSupportPerChannelQuantization

* fix bug for unit test

* Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use

Will have a follow-up PR to fine tune the code

* remove unnecessary warning

Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
(cherry picked from commit 0baf687)
jingyanwangms added a commit that referenced this pull request Nov 30, 2021
* Fix memset size (#9840)

(cherry picked from commit d012d9f)

* [js/web] do not use nodejs type 'Buffer' in web (#9839)

* [js/web] do not use nodejs type 'Buffer' in web

* resolve comments and validate tests

* remove 'Buffer' in test

(cherry picked from commit a3ebc5e)

* Fix potential data race with OrtValue usage in Python (#9841)

(cherry picked from commit 18fd2cf)

* [OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848)

* Changes to ensure openvino build go through in Windows

* Modified Hetero plugin Logic

*Modified Hetero Feature logic. In Hetero,
if the operator to be marked true in getcapability(),
it should be supported by either of the devices
specified with HETERO in the device_type.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV updated to 2021.4.2 version

* OV updated to 2021.4.2 version

* Updated OV to 2021.4.2 version, mono download  link and dotnet version

* Copying Managed nugets in openvino c# docker file

*Copying Managed nuget to nugets artifacts
directory

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
(cherry picked from commit 0ae0f29)

* no fallback when enforcing explicit EP registration. (#9863)

* no fallback when enforcing explicit EP registration.

* add explicit ep registrations for python.

(cherry picked from commit 1e9e57d)

* layernorm throw error if input has no data (#9837)

(cherry picked from commit bf716e6)

* [js/node] npm audit fix (#9861)

(cherry picked from commit 27e337e)

* [python manylinux package] emit warning if missing CUDA/TensorRT dependency causes ld_preload to fail and user tries to register either CUDA/TensorRT EP (#9872)

* add warning if ld_preload fails for CUDA or TRT when trying to register either provider

* refactor

* change wording from register to create

(cherry picked from commit ec9b0ed)

* QDQ tool modification part2 (#9720)

* Add finetuned qdq options

* Add description

* Add unit tests

* Modify for channel axis

* Remove too specific feature. Move this implementation to e2e example

* Add OpTypesSupportPerChannelQuantization

* fix bug for unit test

* Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use

Will have a follow-up PR to fine tune the code

* remove unnecessary warning

Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
(cherry picked from commit 0baf687)

* Cancel transpose optimizer for resize (#9870)

* cancel transpose optimizer for resize

* add UT

* addressing comments

* fix build err

(cherry picked from commit 16bfd3c)

* Add build option to enable cuda profiling (#9875)

(cherry picked from commit 9345894)

Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Maajid khan <n.maajidkhan@gmail.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants