-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][Quantize] Integrate data-aware calibration into quantization #4295
Conversation
d8c7679
to
f60b25d
Compare
f60b25d
to
4050a78
Compare
1729eb1
to
6d3bd3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to have a test case?
It's difficult to do unit tests. The refactor is covered by nightly tests. I plan to add more nightly tests for the new calibration mode later because there are some more work to do |
@yzhliu comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added a couple nits that can be addressed optionally
@@ -54,6 +54,8 @@ def kl_divergence_scale(arr, quantized_dtype='int8', num_bins=8001, num_quantize | |||
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf | |||
""" | |||
assert isinstance(arr, np.ndarray) | |||
assert stats is not None, "scipy need to be installed for \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs
@@ -143,9 +141,20 @@ def qconfig(**kwargs): | |||
nbit_dict: dict of QAnnotateKind -> int | |||
Number of bit for every kind of annotate field. | |||
|
|||
calibrate_mode: str | |||
The calibration mode. 'global_scale' or 'kl'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we spell it out so it's more self explanatory (e.g. kullback_leibler
)?
@vinx13 I realize that this PR might have broken the E2E VTA tutorial flow (that includes quantization for VTA hardware pipeline). I'll need to investigate, but what puzzles me more is why that wasn't caught by the CI which should be building the sphinx galleries successfully, which should include the e2e VTA tutorial. |
The script in question is |
…pache#4295) * [Relay][Quantize] Integrate data-aware calibration into quantization * Update _calibrate.py * trigger ci * Address comments * address comments
…pache#4295) * [Relay][Quantize] Integrate data-aware calibration into quantization * Update _calibrate.py * trigger ci * Address comments * address comments
Fix for the VTA bug introduced is in #4433. |
This PR did some refactor work for the calibration part: integrate evaluation script for KL, the collect stats part has been moved into internal collect_stats. New config options
calibration_mode
,weight_scale
have been added.Removed
opt_level=3
forprerequisite_optimize
, as it caused some accuracy issue whenFoldScaleAxis
is invoked before calibration.Part of this PR is based on #3828.
@ZihengJiang @anijain2305 @tmoreau89