-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][DRAFT] Is this the right track? Generating Dense Quantized op #4
base: main
Are you sure you want to change the base?
[WIP][DRAFT] Is this the right track? Generating Dense Quantized op #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks pretty good to me, just a few nitpicky things :)
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
self.ref_count = 0 | ||
self.qparams = [] | ||
|
||
def get_qparams(self, name_hint: str, dtype: str = "int8") -> QParams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in a name to this function is good, it will help avoid confusion.
Right now, your implementation saves every QParam ever created and adds it to a list. I'm not sure this is the exact behavior we want, since the purpose of the class saving the QParams is to let the C++ rewriter see the QParams that were created in the rewrite of the last pattern, not all the QParams. (We can get all the scales and zero point variables by doing relay.analysis.free_vars, so we don't have to store them all in a list).
However, I'm not 100% sure of the implementation details of that because it will have to interact with the FFI. For now, maybe we should just not save the QParams we've previously created, and we can add the functionality when we need it.
) -> Tuple[tvm.relay.Expr, QParams]: | ||
"""TODO""" | ||
|
||
# TODO: figure out whether we need this or we can always have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Units is a dense attribute, but it is not always set, so sometimes the caller will be able to pass in the units and sometimes not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the units in dense is in_units, so the caller won't have access to out_units unless they look at the shape.
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
python/tvm/relay/transform/quantization/quantized_operators/dense.py
Outdated
Show resolved
Hide resolved
Could you change the file name to utils.py? I'm not a huge fan of common :) Also, not sure if |
One question to investigate is whether the constant folder will get rid of all the extra terms created by the non-zero zero point in the quantized inference version of the graph. If not, we may need to check the actual value of the zero point while generating the quantized relay for inference. |
96e14d9
to
24ebe4d
Compare
…ng on your preference
…ng on your preference
24ebe4d
to
467de8b
Compare
…write-utils-alt-interface Alt interface branch goes to quantization dev main branch
* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Per channel fq2i (apache#8) * WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <jwfromm@octoml.ai> * respond to review comments respond to review comments Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Per channel fq2i (apache#8) * WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (apache#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <jwfromm@octoml.ai> * respond to review comments * start dtos * wip depth_to_space * dtos ident Co-authored-by: Matthew <mbrookhart@octoml.ai> Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Add the qparam manager class Lily alluded to here
Generate the dense layer according to math (TM) and using existing relay nodes. Have not checked for correctness of implementation but hopefully the idea is clear
These relay generation functions return the new relay expression and the associated qparams for the output.
The hot take is that simulated quantize and static quantize graphs are always going to be very similar the only difference is the actual internal casting between them. We try to capture this with:
internal_accumulation_dtype: str = "float32" which represents the actual type being accumulated and
simulated_accumulation_dtype: str = "int32" which represents the accumulated type being simulated
Things need to do / think about:
- per channel quantization
- what are the actual differences between simulated and static quantized graphs?
- organizing files in general
- a lot of other stuff...