Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][DRAFT] Is this the right track? Generating Dense Quantized op #4

Draft
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

AndrewZhaoLuo
Copy link

@AndrewZhaoLuo AndrewZhaoLuo commented Apr 9, 2021

  • Add the qparam manager class Lily alluded to here

  • Generate the dense layer according to math (TM) and using existing relay nodes. Have not checked for correctness of implementation but hopefully the idea is clear

  • These relay generation functions return the new relay expression and the associated qparams for the output.

  • The hot take is that simulated quantize and static quantize graphs are always going to be very similar the only difference is the actual internal casting between them. We try to capture this with:

    internal_accumulation_dtype: str = "float32" which represents the actual type being accumulated and

    simulated_accumulation_dtype: str = "int32" which represents the accumulated type being simulated

Things need to do / think about:
- per channel quantization
- what are the actual differences between simulated and static quantized graphs?
- organizing files in general
- a lot of other stuff...

Copy link
Owner

@electriclilies electriclilies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks pretty good to me, just a few nitpicky things :)

self.ref_count = 0
self.qparams = []

def get_qparams(self, name_hint: str, dtype: str = "int8") -> QParams:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in a name to this function is good, it will help avoid confusion.

Right now, your implementation saves every QParam ever created and adds it to a list. I'm not sure this is the exact behavior we want, since the purpose of the class saving the QParams is to let the C++ rewriter see the QParams that were created in the rewrite of the last pattern, not all the QParams. (We can get all the scales and zero point variables by doing relay.analysis.free_vars, so we don't have to store them all in a list).

However, I'm not 100% sure of the implementation details of that because it will have to interact with the FFI. For now, maybe we should just not save the QParams we've previously created, and we can add the functionality when we need it.

) -> Tuple[tvm.relay.Expr, QParams]:
"""TODO"""

# TODO: figure out whether we need this or we can always have the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Units is a dense attribute, but it is not always set, so sometimes the caller will be able to pass in the units and sometimes not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the units in dense is in_units, so the caller won't have access to out_units unless they look at the shape.

@electriclilies
Copy link
Owner

Could you change the file name to utils.py? I'm not a huge fan of common :) Also, not sure if tvm.relay.transform.quantization is the best namespace.. I personally prefertvm.relay.transform.quantize, but it's worth a discussion

@electriclilies
Copy link
Owner

electriclilies commented Apr 9, 2021

One question to investigate is whether the constant folder will get rid of all the extra terms created by the non-zero zero point in the quantized inference version of the graph. If not, we may need to check the actual value of the zero point while generating the quantized relay for inference.
Not something we need to worry about right now, but it should be on our radar.

@AndrewZhaoLuo AndrewZhaoLuo force-pushed the quantization-dev-main-add-rewrite-utils branch from 96e14d9 to 24ebe4d Compare April 30, 2021 19:55
@AndrewZhaoLuo AndrewZhaoLuo force-pushed the quantization-dev-main-add-rewrite-utils branch from 24ebe4d to 467de8b Compare May 5, 2021 21:30
electriclilies pushed a commit that referenced this pull request Sep 11, 2021
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (apache#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

respond to review comments

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
electriclilies pushed a commit that referenced this pull request Sep 11, 2021
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (apache#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

* start dtos

* wip depth_to_space

* dtos ident

Co-authored-by: Matthew <mbrookhart@octoml.ai>
Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants