[Relay][Quantization] Per-Channel FQ2I #8883

mbrookhart · 2021-08-30T15:40:37Z

This extend the FQ2I pass to support per-channel quantization needed for a number of TF models I've run into recently.

Many thanks to @jwfromm for help finding and fixing issues in the QNN ops with per-channel quantization.

elvin-n · 2021-08-31T15:17:34Z

python/tvm/relay/frontend/onnx.py

@@ -478,7 +478,7 @@ def _impl_v1(cls, inputs, attr, params):
                attr["dilations"] = [1] + list(attr["dilations"])
            if "pads" in attr:
                attr["pads"] = [0, attr["pads"][0], 0, attr["pads"][1]]
-
+        attr["channels"] = kernel_shapes[0][0]


does this change relate to the subject of the PR or it is side fix? Don't see where "channels" attribute is used in other changes of this PR

This is a side fix, I utilized it for an intermediate solution for conv, but I guess I don't technically need it. It doesn't seem to be required in many of the relay passes.

elvin-n · 2021-08-31T15:34:06Z

just to dblcheck assumptions for this PR - we allow to have per-channel scales on the weight inputs only, right? Other stuff like binary op is only a helper for this main purpose?

mbrookhart · 2021-08-31T15:40:11Z

Yeah, the way the math works out, we can only really do per-channel quantization on weights and still end up with an integer conv/dense op, if we have per-channel quantization on the data going into one of these ops the lowering breaks.

All the unary/binary stuff is to help with fusing ops that show up in the graph after the contraction but before the requantize, i.e., things we want to fuse into the contraction.

include/tvm/ir/affine_type.h

python/tvm/relay/transform/fake_quantization_to_integer.py

python/tvm/ir/affine_type.py

AndrewZhaoLuo · 2021-09-03T16:52:49Z

python/tvm/relay/transform/fake_quantization_to_integer.py

@@ -18,13 +18,22 @@
 import tvm
 from tvm import relay
 from tvm.ir import TensorAffineType, TupleAffineType
+from tvm.tir import bijective_layout
 from ..op import register_fake_quantization_to_integer


 def fold_constant(expr):


Hmm can you use the functions in python/tvm/relay/frontend/common.py or modify them slightly there to support your need?

Unfortunately, since this is in transforms and the frontends call the transforms namespace extensively, importing that will cause an import loop.

AndrewZhaoLuo · 2021-09-03T16:55:01Z

python/tvm/relay/transform/fake_quantization_to_integer.py

 from ..op import register_fake_quantization_to_integer


 def fold_constant(expr):
    return relay.transform.FoldConstantExpr(expr, tvm.IRModule())


+def get_zeros(scale):
+    return fold_constant(relay.op.cast(relay.op.zeros_like(scale), "int32"))


Should this be a cast_like or do we always want things to be int32?

zero_points in qnn are always int32

src/relay/qnn/op/dense.cc

elvin-n

Need to change the description of the qnn.conv2d and qnn.dense to describe which zp and scales are expected. python/tvm/relay/qnn/op/qnn.py is not included into this PR, adding separate note

for example for conv2d we have now

    input_zero_point: tvm.relay.Expr
           The zero point of the data distribution.

    kernel_zero_point: tvm.relay.Expr
           The zero point of the quantized_kernel distribution.

    input_scale: tvm.relay.Expr
           The scale for the input tensor. The scale for the input tensor is
           stored purely for convenience here. See more commentary below.

    kernel_scale: tvm.relay.Expr
           The scale for the weight tensor. The scale for the weight tensor is
           stored for access to this during relay. This information is not
           needed in the pass pipeline after qnn.conv2d is lowered to the
           sequence of steps as in nn.conv2d. See also input_scale in Requantize.

while we allow to have only scalar for input zp and input scales and any(?) zp and scales for weights

python/tvm/ir/affine_type.py

elvin-n · 2021-09-06T08:28:51Z

python/tvm/relay/transform/fake_quantization_to_integer.py

+    arg = expr.args[0]
+    t = type_map[arg]
+    scale_shape = infer_shape(t.scale)
+    z_p = t.zero_point


we might have here 4 situations

scale is scalar, zp is scalar

scale is scalar, zp is not scalar

scale is not scalse, zp is scalar

scale is not scalar, zp is not scalar

cases 3 and 4 are covered by next if, we broadcast zp to scale shapes by axist from AffineType structure
Q: will zp by updated in-place in the TensorAffineType map after broadcast?

case 1 is ok

case 2 - Q: don't we need to handle this explicitly and broadcast scale?

on the first Q, no, the broadcasted z_p will not be updated in place, it's only used in the computation.

On the second Q, that's an interesting point. I guess I haven't every seen it, but it's feasible. Qnn currently supports scalar scale and scalar zp OR vector scale and scalar zp OR vector scale and vector zp, which matches all of the combinations I've ever seen in the wild. What do you think, should we try to support that in QNN?

python/tvm/relay/transform/fake_quantization_to_integer.py

* Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests.

* Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug.

* Fix legalization for non spatial operators. * Fix axis checks for end2end functionality.

fix lint fix lint again

* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

respond to review comments

* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (apache#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (apache#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Per channel fq2i (apache#8) * WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (apache#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (apache#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <jwfromm@octoml.ai> * respond to review comments respond to review comments Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

mbrookhart requested review from anijain2305, areusch, comaniac, Huyuwei, jroesch, junrushao, jwfromm, kazum, MarisaKirisame, merrymercy, siju-samuel, slyubomirsky, srkreddy1238, tqchen, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners August 30, 2021 15:40

mbrookhart force-pushed the per_channel_fq2i branch from 37abf7e to 5d73612 Compare August 30, 2021 19:48

elvin-n reviewed Aug 31, 2021

View reviewed changes

mbrookhart force-pushed the per_channel_fq2i branch from 5871a8d to c31512d Compare August 31, 2021 19:46

masahi reviewed Sep 1, 2021

View reviewed changes

include/tvm/ir/affine_type.h Outdated Show resolved Hide resolved

masahi reviewed Sep 1, 2021

View reviewed changes

python/tvm/relay/transform/fake_quantization_to_integer.py Outdated Show resolved Hide resolved

AndrewZhaoLuo reviewed Sep 3, 2021

View reviewed changes

mbrookhart force-pushed the per_channel_fq2i branch from b43ae30 to 607ba7c Compare September 3, 2021 19:44

elvin-n suggested changes Sep 6, 2021

View reviewed changes

WIP support per-channel quantization

208d048

Matthew and others added 10 commits September 7, 2021 13:53

more WIP

f30d698

More WIP

f461c81

fix issue with per-channel bias_add

65ab701

Fix fake quantize tests (#4)

3d07acb

* Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests.

Add Relu

fefca2c

One more little one (#5)

c4c746b

* Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug.

Non-working Per-channel Dense

fffc549

Fix legalization for non spatial operators. (#6)

4b81904

* Fix legalization for non spatial operators. * Fix axis checks for end2end functionality.

fix axis normalization

9172d4d

fix lint fix lint again

mbrookhart force-pushed the per_channel_fq2i branch from 607ba7c to b2b3d1f Compare September 7, 2021 20:32

respond to review comments

e5d80e5

respond to review comments

mbrookhart force-pushed the per_channel_fq2i branch from b2b3d1f to e5d80e5 Compare September 7, 2021 20:35

masahi approved these changes Sep 8, 2021

View reviewed changes

masahi merged commit 4ffbdcd into apache:main Sep 8, 2021

mbrookhart deleted the per_channel_fq2i branch September 8, 2021 15:08

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][Quantization] Per-Channel FQ2I #8883

[Relay][Quantization] Per-Channel FQ2I #8883

mbrookhart commented Aug 30, 2021

elvin-n Aug 31, 2021

mbrookhart Aug 31, 2021

elvin-n commented Aug 31, 2021

mbrookhart commented Aug 31, 2021 •

edited

Loading

AndrewZhaoLuo Sep 3, 2021

mbrookhart Sep 7, 2021

AndrewZhaoLuo Sep 3, 2021

mbrookhart Sep 7, 2021

elvin-n left a comment

elvin-n Sep 6, 2021

mbrookhart Sep 7, 2021

[Relay][Quantization] Per-Channel FQ2I #8883

[Relay][Quantization] Per-Channel FQ2I #8883

Conversation

mbrookhart commented Aug 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elvin-n commented Aug 31, 2021

mbrookhart commented Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elvin-n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbrookhart commented Aug 31, 2021 •

edited

Loading