Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN] Add operator #3736

Merged
merged 1 commit into from
Sep 5, 2019
Merged

[QNN] Add operator #3736

merged 1 commit into from
Sep 5, 2019

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Aug 8, 2019

Adding QNN Add operator.

The inputs to QNN Add operator can have different scales and zero points. This PR adds a QNN Add operator that first requantizes the inputs to output scale and shift and then call relay.add. This approach is also used by TF.

@u99127
Copy link
Contributor

u99127 commented Aug 8, 2019

This needs a rebase and squash and some description to go with the pull request.

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?

I suspect one could end up with one less requantize step as this is just

{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}


def requantize(data,
input_scale,
input_zero_point,
output_scale,
output_zero_point,
rounding="TONEAREST",
out_dtype="int8"):
out_dtype='int8'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an unrelated change ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was unintended change. I will be consistent in using single or double quotes.

@anijain2305
Copy link
Contributor Author

Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?

I suspect one could end up with one less requantize step as this is just

{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}

Aah, very nice observation. This should be caught here inside the qnn.add operator. Let me make the changes.

@anijain2305
Copy link
Contributor Author

@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly.

@u99127
Copy link
Contributor

u99127 commented Aug 8, 2019

@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly.

No worries, I had another review comment that I missed publishing around testing more cases than just with zero zero_points which is what the tests seem to be doing.

@anijain2305
Copy link
Contributor Author

@u99127 @jackwish @tqchen This is ready for review.

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several concerns, not necessarily block the merging.

python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved

if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point:
out = relay.add(lhs, rhs)
out = relay.subtract(out, relay.const(lhs_zero_point, dtype=in_dtype))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to add

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtracting one zero point and letting requantize handle another one is a bit tricky? I know it were to avoid the requantize sometime though...

python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved
python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved
out_dtype=in_dtype)

out = relay.add(requantized_lhs, requantized_rhs)
out = relay.subtract(out, relay.const(output_zero_point, dtype=in_dtype))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output casting concern.

@anijain2305
Copy link
Contributor Author

@jackwish Addressed your comments by going to a int32 for addition and then casting back when necessary. Also, added test cases.

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As I am not community reviewer, you may need someone else to approve, my comments are only comments. :)

# output qnn params. The add op is done in int32 precision.

if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point:
lhs = relay.cast(lhs, dtype='int32')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add/sub in int16 should be enough, but not a big deal :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that :) Currently, it fails before requantize input can only be (uint8, int8, int32). I think for now, it should be ok. If we see more demand of int16, we can add support across all the QNN ops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is far from a blocking issue :) Thank you.

y_datas = [np.array((204, 178, 165, 140)).reshape((1,4)),
np.array((204, 178, 191, 25)).reshape((1,4)),
np.array((204, 178, 25, 191)).reshape((1,4))]
golden_outputs = [np.array((217,204,203,191)).reshape((1, 4)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit curious, are these data coming from TFLite computing results, or manually computed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFLite adds quantize and dequantize. I am trapping the numbers after they have been quantized and before they are getting dequantized. So, these are from TFLite but not the actual GTest that you see, but the internals of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for the explanation.

@anijain2305 anijain2305 force-pushed the qnn_add branch 2 times, most recently from 64311a1 to c4d3199 Compare August 12, 2019 03:08
@leo-blonk
Copy link
Contributor

It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator.
Do we know what the quantization parameters are at each point in the Relay graph?

@anijain2305
Copy link
Contributor Author

anijain2305 commented Aug 12, 2019

It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator.
Do we know what the quantization parameters are at each point in the Relay graph?

@Leo-arm this is a very good point. For background, there are 2 parallel efforts in TVM community right now

  • Automatic Quantization - Quantization in Relay. FP32 model comes in and Relay pass quantizes it.
  • Supporting Pre-quantized models - The quantization happens outside of Relay and TVM. (This PR belongs to this category). Here, the main goal is to create new QNN ops that will be converted to existing Relay ops.

We can share the HW schedules between these two options.

What you suggested is almost what happens today in the Automatic quantization project. It is also somewhat easier there because Automatic quantization only works with symmetric quantization.

For doing this in pre-quantized models, it is somewhat tricky because it happens on an op-by-op basis (requantize happens before in concatenate, but it happens later in conv2d. The zero point handling is different for add vs multiply). But, there are certainly some patterns that can be exploited to make it easier. For example, we can have a helper function that requantizes the inputs and then calls the main op, and all the QNN ops that follow this pattern can rely on the helper function. For now, this seems like a good intermediate point. As we get more ops and pattens, we can refactor those using helper functions.

@u99127
Copy link
Contributor

u99127 commented Aug 13, 2019

Assume below that ip0, ip1 and op are all 8 bit tensors with identical zero points and all the other cases.

{code}
op = add (ip0, ip1);
{code}

now gets lowered into :

{code}
tmp1 = recast (ip0, 32 bits)
tmp2 = recast (ip1, 32 bits)
tmp3 = add (tmp1, tmp2)
op = requantize (tmp3)
{code}

Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored ?

@anijain2305
Copy link
Contributor Author

Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored?

  • If the original op did not have any quantization params, if It was just plain int8 op, we will directly call relay.add instead of qnn.add.
  • In case of original op with quantization params, I would still like to cast. Since in quantization, the integers are just a proxy for FP32 numbers, overflow (with just int8) can lead to a bad proxy FP32 number. So, I have opted for a saturating behavior (manifested by requantize).

@anijain2305
Copy link
Contributor Author

@u99127 Please review again to ensure that your questions are answered :)

@vinx13 Ping for review.

@anijain2305 anijain2305 changed the title [QNN] Add operator [QNN] WIP - Add operator Aug 15, 2019
@anijain2305
Copy link
Contributor Author

Do not submit yet. Will move the codebase to C++ to avoid calling InferType.

@anijain2305 anijain2305 force-pushed the qnn_add branch 3 times, most recently from d85a4aa to b6a679d Compare August 22, 2019 19:25
@anijain2305 anijain2305 changed the title [QNN] WIP - Add operator [QNN] Add operator Aug 22, 2019
@anijain2305
Copy link
Contributor Author

Moved to C++. Removed WIP tag

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed re-checking, I have been working on some other directions. Seems rebase is needed as #3819 has been merged :)

@@ -97,6 +97,64 @@ struct DequantizeAttrs : public tvm::AttrsNode<DequantizeAttrs> {
}
};

/*! \brief Attributes used in QNN concatenate operators */
struct QnnConcatenateAttrs : public tvm::AttrsNode<QnnConcatenateAttrs> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this part code has been merged in #3819

@anijain2305 anijain2305 force-pushed the qnn_add branch 2 times, most recently from 2a60ea6 to 65c2eec Compare September 4, 2019 06:44
@anijain2305
Copy link
Contributor Author

@jackwish Can you please review? I have rebased to master.

@anijain2305 anijain2305 force-pushed the qnn_add branch 2 times, most recently from 2dcdb05 to a1e0fc2 Compare September 4, 2019 17:13
@anijain2305
Copy link
Contributor Author

@zhiics @vinx13 Can you please review when you get time.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. @jackwish @u99127 could you take another look when you are available?

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM generally, minor comments which won't block merging :)


// FIXME (anijain2305) - The lowering can be further optimized. Instead of inserting requantize in
// the start, we can insert requantize at the end if and only if all the input tensors have same
// qnn params. This can be done in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that same scale can lead to requantize after ADD, zero point can be safely subtracted :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me change the comment.

}

// Upcast to maintain precision.
requantized_lhs = Cast(requantized_lhs, Int(32));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that result of two int8 subtracting can be hold in int16? But not big deal :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently Requantize does not support Int16. So, we can skip it for now. If we see int16 need later on, we can start supporting it across all ops.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiics zhiics merged commit a6bb84a into apache:master Sep 5, 2019
@zhiics
Copy link
Member

zhiics commented Sep 5, 2019

Thanks @anijain2305 @jackwish @u99127, this is now merged.

MarisaKirisame pushed a commit to MarisaKirisame/tvm that referenced this pull request Sep 7, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019
@anijain2305 anijain2305 deleted the qnn_add branch November 13, 2019 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants