Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] Restore Quantization API to MXNet #19587

Merged
merged 40 commits into from
Dec 12, 2020

Conversation

bgawrych
Copy link
Contributor

Description

This PR restores quantization API and some examples to master branch of MXNet. Change prepared together with @sfraczek and @grygielski
Quantization API now utilizes HybridBlock as symbol executor and new features like optimize_for
Moreover:

  • enabled numpy semantics support
  • improved custom calibration flow (user can now use layer names which need calibration in their layer output collectors)
  • renamed quantize_net_v2 to quantize_net
  • using DataLoader instead of DataIter in model calibration
  • retained quantize_model and quantize_model_mkldnn which works with symbol

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

@anko-intel @sfraczek @grygielski @TaoLv @pengzhao-intel @ciyongch

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, website, unix-cpu, clang, unix-gpu, centos-gpu, windows-cpu, miscellaneous, centos-cpu, edge, windows-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 25, 2020
bgawrych and others added 24 commits November 27, 2020 10:09
This reverts commit a8b737a473ca6529a1969b748ea03c40e12c0798.
Needs refactor of conv and fc common part
'if calib_data is not None' and 'if not data_shapes'
merge with bgawrych

cannot do inplace convolution and the sum and input tesnsors are shared already

remove cout

spaces after if

refactor if else
Conflicts:
	src/operator/subgraph/mkldnn/mkldnn_conv_property.h
review fixes
remove unused parameters
change rgb

small fix

add alexnet exclude

fix filename suffix

refactor first conv exclude v1

v2

v3

fix names of layers

fix bug
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 10, 2020
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Dec 10, 2020
@bgawrych
Copy link
Contributor Author

Do we need to start a new RFC or design doc as demonstrated originally in [1] and [2]?

[1] https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
[2] #9552

We haven't changed design of quantization - all code changes in backend are related to changed node naming conventions. I don't know if in this case we must propose new RFC as previous one was accepted and all what we have done here is bringing it back

@pengzhao-intel
Copy link
Contributor

We don't need a new RFC since this is the same approach (quantization flow) migration from 1.x to 2.x.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I will merge soon if no other comments.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 10, 2020
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 10, 2020
@bgawrych
Copy link
Contributor Author

@szha is something wrong with unix-gpu CI? I got following error two times in a row:

  • [2020-12-10T12:17:34.769Z] Cannot contact mxnetlinux-cpu_cwcmun1huz: java.lang.InterruptedException
    ...
    [2020-12-10T12:28:54.084Z] FAILED: CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o
    ...
    [2020-12-10T12:28:54.084Z] c++: fatal error: Killed signal terminated program cc1plus

but in run before in GPU: MKLDNN job it was CMakeFiles/mxnet.dir/src/operator/numpy/np_elemwise_broadcast_logic_op.cc.o

I don't think it's releated to my change

@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 10, 2020
@leezu
Copy link
Contributor

leezu commented Dec 10, 2020

@bgawrych you can refer to #19623 for the CI issue. It looks like an effect of #18501 together with memory usage increase in recent versions of gcc. If you have time to help fix it, that would be great.

@pengzhao-intel
Copy link
Contributor

It's great the CI passed.

Thanks the great efforts from the team to re-enable quantization flow in the MXNet 2.0: )

I am going to merge this PR. @anko-intel @bgawrych

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging now.

@pengzhao-intel pengzhao-intel merged commit 3746bab into apache:master Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants