[Torch] Experimental support for FX-quantized models #10091

masahi · 2022-01-28T07:59:59Z

This is the first step toward supporting models quantized with the FX-based workflow as described in https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html.

The required change was surprisingly simple: Simple graph surgery done by inline_input_quant_params_for_fx(...) in qnn_torch.py is enough. So far, I was able to quantize imagenet models, deeplab v3, ssd-vgg, and yolov5, either fully or semi automatically. See the attached test cases.

Since my current interest is to collect real-world quantized workloads for performance benchmarking, I didn't care about calibration.

Also added aten::clamp_min support for SSD-VGG.

@comaniac @lhutton1 @junrushao1994 @siju-samuel @t-vi @AndrewZhaoLuo

t-vi

Looks good to me. Thank you @masahi
I have two minor suggestions for optional improvements/follow-ups (should not keep you from merging though).

python/tvm/relay/frontend/pytorch.py

t-vi · 2022-01-28T08:28:34Z

python/tvm/relay/frontend/pytorch.py

-        amin = get_v(inputs[1], np.finfo(np.float32).min)
-        amax = get_v(inputs[2], np.finfo(np.float32).max)
+        if min_only:
+            amin = get_v(inputs[1], np.finfo(np.float32).min)


It is not introduced in this PR, but is using float32's max prudent here? We might have all sorts of dtypes as inputs, also -inf should probably not be clamped to -3.4028235e+38.

Good catch, fixed but not sure what to do with clamping inf.

I think -inf should stay -inf if we clamp with only max. PyTorch uses separate implementation kernels that only do the required ops for this (kind of the reverse of what we do here), I don't know if that might be a good choice for TVM.

I see, that makes sense. Probably the best way for us is to allow None value in clip op for one-way clipping, and do only max or min inside topi. I left a TODO item for that.

Probably other op conversions also have issues with inf handling...

masahi · 2022-02-01T10:33:01Z

Can somebody merge this PR? I'm not aware of anyone who is familiar with PT quantization, but I believe the change should be no brainer. This is a nice feature to land, which was also requested in https://discuss.tvm.apache.org/t/pytorch-qnn-cannot-import-torchscript-produced-by-fx-graph-mode-quantization/11954

* works on resnet18 and deeplabv3 * yolo5 conversion worked * fixed sigmoid * [Torch] Support clamp_min, clamp_max * fixed clamp_min * fixed quantize for 1 dim input * cleanup * improve inline_qparam impl * add clamp_min/max test * add fx quant test * cleanup * skip build in testing * black * improve clamp conversion * leave TODO on inf handling

masahi added 12 commits January 28, 2022 14:47

works on resnet18 and deeplabv3

3595513

yolo5 conversion worked

2f1330e

fixed sigmoid

b689ba1

[Torch] Support clamp_min, clamp_max

f553142

fixed clamp_min

3aea2f0

fixed quantize for 1 dim input

fe7dea8

cleanup

970aa90

improve inline_qparam impl

3657d4a

add clamp_min/max test

70fd923

add fx quant test

d5aee6a

cleanup

8054cbd

skip build in testing

d300643

masahi requested review from areusch, comaniac, Huyuwei, icemelon, jroesch, junrushao, jwfromm, kazum, mbrookhart, merrymercy, siju-samuel, srkreddy1238, tqchen and yzhliu as code owners January 28, 2022 08:00

black

cb29854

masahi force-pushed the fx-quant branch from b31c046 to cb29854 Compare January 28, 2022 08:08

t-vi approved these changes Jan 28, 2022

View reviewed changes

improve clamp conversion

fc0bede

leave TODO on inf handling

d7c7539

masahi force-pushed the fx-quant branch from cdf811d to d7c7539 Compare January 28, 2022 10:49

jroesch approved these changes Feb 4, 2022

View reviewed changes

jroesch merged commit 95aac92 into apache:main Feb 4, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch] Experimental support for FX-quantized models #10091

[Torch] Experimental support for FX-quantized models #10091

masahi commented Jan 28, 2022 •

edited

Loading

t-vi left a comment

t-vi Jan 28, 2022

masahi Jan 28, 2022

t-vi Jan 28, 2022 •

edited

Loading

masahi Jan 28, 2022 •

edited

Loading

masahi commented Feb 1, 2022

[Torch] Experimental support for FX-quantized models #10091

[Torch] Experimental support for FX-quantized models #10091

Conversation

masahi commented Jan 28, 2022 • edited Loading

t-vi left a comment

Choose a reason for hiding this comment

t-vi Jan 28, 2022

Choose a reason for hiding this comment

masahi Jan 28, 2022

Choose a reason for hiding this comment

t-vi Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

masahi Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

masahi commented Feb 1, 2022

masahi commented Jan 28, 2022 •

edited

Loading

t-vi Jan 28, 2022 •

edited

Loading

masahi Jan 28, 2022 •

edited

Loading