-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Add operator #3736
[QNN] Add operator #3736
Conversation
This needs a rebase and squash and some description to go with the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?
I suspect one could end up with one less requantize step as this is just
{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}
python/tvm/relay/qnn/op/qnn.py
Outdated
|
||
def requantize(data, | ||
input_scale, | ||
input_zero_point, | ||
output_scale, | ||
output_zero_point, | ||
rounding="TONEAREST", | ||
out_dtype="int8"): | ||
out_dtype='int8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really an unrelated change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was unintended change. I will be consistent in using single or double quotes.
Aah, very nice observation. This should be caught here inside the |
@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly. |
No worries, I had another review comment that I missed publishing around testing more cases than just with zero zero_points which is what the tests seem to be doing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several concerns, not necessarily block the merging.
python/tvm/relay/qnn/op/qnn.py
Outdated
|
||
if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point: | ||
out = relay.add(lhs, rhs) | ||
out = relay.subtract(out, relay.const(lhs_zero_point, dtype=in_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtracting one zero point and letting requantize handle another one is a bit tricky? I know it were to avoid the requantize sometime though...
python/tvm/relay/qnn/op/qnn.py
Outdated
out_dtype=in_dtype) | ||
|
||
out = relay.add(requantized_lhs, requantized_rhs) | ||
out = relay.subtract(out, relay.const(output_zero_point, dtype=in_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output casting concern.
@jackwish Addressed your comments by going to a int32 for addition and then casting back when necessary. Also, added test cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
As I am not community reviewer, you may need someone else to approve, my comments are only comments. :)
python/tvm/relay/qnn/op/qnn.py
Outdated
# output qnn params. The add op is done in int32 precision. | ||
|
||
if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point: | ||
lhs = relay.cast(lhs, dtype='int32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add/sub in int16 should be enough, but not a big deal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried that :) Currently, it fails before requantize input can only be (uint8, int8, int32). I think for now, it should be ok. If we see more demand of int16, we can add support across all the QNN ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is far from a blocking issue :) Thank you.
y_datas = [np.array((204, 178, 165, 140)).reshape((1,4)), | ||
np.array((204, 178, 191, 25)).reshape((1,4)), | ||
np.array((204, 178, 25, 191)).reshape((1,4))] | ||
golden_outputs = [np.array((217,204,203,191)).reshape((1, 4)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit curious, are these data coming from TFLite computing results, or manually computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFLite adds quantize and dequantize. I am trapping the numbers after they have been quantized and before they are getting dequantized. So, these are from TFLite but not the actual GTest that you see, but the internals of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for the explanation.
64311a1
to
c4d3199
Compare
It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator. |
@Leo-arm this is a very good point. For background, there are 2 parallel efforts in TVM community right now
We can share the HW schedules between these two options. What you suggested is almost what happens today in the Automatic quantization project. It is also somewhat easier there because Automatic quantization only works with symmetric quantization. For doing this in pre-quantized models, it is somewhat tricky because it happens on an op-by-op basis ( |
Assume below that ip0, ip1 and op are all 8 bit tensors with identical zero points and all the other cases. {code} now gets lowered into : {code} Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored ? |
|
Do not submit yet. Will move the codebase to C++ to avoid calling InferType. |
d85a4aa
to
b6a679d
Compare
Moved to C++. Removed WIP tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed re-checking, I have been working on some other directions. Seems rebase is needed as #3819 has been merged :)
include/tvm/relay/qnn/attrs.h
Outdated
@@ -97,6 +97,64 @@ struct DequantizeAttrs : public tvm::AttrsNode<DequantizeAttrs> { | |||
} | |||
}; | |||
|
|||
/*! \brief Attributes used in QNN concatenate operators */ | |||
struct QnnConcatenateAttrs : public tvm::AttrsNode<QnnConcatenateAttrs> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this part code has been merged in #3819
2a60ea6
to
65c2eec
Compare
@jackwish Can you please review? I have rebased to master. |
2dcdb05
to
a1e0fc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally, minor comments which won't block merging :)
src/relay/qnn/op/add.cc
Outdated
|
||
// FIXME (anijain2305) - The lowering can be further optimized. Instead of inserting requantize in | ||
// the start, we can insert requantize at the end if and only if all the input tensors have same | ||
// qnn params. This can be done in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that same scale
can lead to requantize after ADD, zero point
can be safely subtracted :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me change the comment.
src/relay/qnn/op/add.cc
Outdated
} | ||
|
||
// Upcast to maintain precision. | ||
requantized_lhs = Cast(requantized_lhs, Int(32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that result of two int8 subtracting can be hold in int16? But not big deal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently Requantize does not support Int16. So, we can skip it for now. If we see int16 need later on, we can start supporting it across all ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @anijain2305 @jackwish @u99127, this is now merged. |
Adding QNN Add operator.
The inputs to QNN Add operator can have different scales and zero points. This PR adds a QNN Add operator that first requantizes the inputs to output scale and shift and then call relay.add. This approach is also used by TF.