[QNN] Add operator #3736

anijain2305 · 2019-08-08T17:33:51Z

Adding QNN Add operator.

The inputs to QNN Add operator can have different scales and zero points. This PR adds a QNN Add operator that first requantizes the inputs to output scale and shift and then call relay.add. This approach is also used by TF.

u99127 · 2019-08-08T20:06:41Z

This needs a rebase and squash and some description to go with the pull request.

u99127

Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?

I suspect one could end up with one less requantize step as this is just

{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}

u99127 · 2019-08-08T20:07:51Z

python/tvm/relay/qnn/op/qnn.py


 def requantize(data,
               input_scale,
               input_zero_point,
               output_scale,
               output_zero_point,
               rounding="TONEAREST",
-               out_dtype="int8"):
+               out_dtype='int8'):


Is this really an unrelated change ?

Yes, this was unintended change. I will be consistent in using single or double quotes.

anijain2305 · 2019-08-08T21:11:19Z

Is this the right place to finesse the (lhs_scale == rhs_scale and lhs_z_p == rhs_z_p) or is this caught elsewhere ?

I suspect one could end up with one less requantize step as this is just

{code}
output = relay.add (lhs, rhs);
return requantize (output , ....)
{code}

Aah, very nice observation. This should be caught here inside the qnn.add operator. Let me make the changes.

anijain2305 · 2019-08-08T22:59:11Z

@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly.

u99127 · 2019-08-08T23:08:31Z

@u99127 While working on your comment, I realized that I am not certain the lowering is correct. Give me a couple of days to dig deeper into the TFLite codebase to see what they do. I am not sure if I am handling zero points correctly.

No worries, I had another review comment that I missed publishing around testing more cases than just with zero zero_points which is what the tests seem to be doing.

anijain2305 · 2019-08-09T20:41:27Z

@u99127 @jackwish @tqchen This is ready for review.

zhenhuaw-me

Several concerns, not necessarily block the merging.

python/tvm/relay/qnn/op/qnn.py

zhenhuaw-me · 2019-08-10T05:54:40Z

python/tvm/relay/qnn/op/qnn.py

+
+    if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point:
+        out = relay.add(lhs, rhs)
+        out = relay.subtract(out, relay.const(lhs_zero_point, dtype=in_dtype))


similar to add

Subtracting one zero point and letting requantize handle another one is a bit tricky? I know it were to avoid the requantize sometime though...

python/tvm/relay/qnn/op/qnn.py

zhenhuaw-me · 2019-08-10T06:04:52Z

python/tvm/relay/qnn/op/qnn.py

+                                     out_dtype=in_dtype)
+
+    out = relay.add(requantized_lhs, requantized_rhs)
+    out = relay.subtract(out, relay.const(output_zero_point, dtype=in_dtype))


Output casting concern.

anijain2305 · 2019-08-12T02:20:00Z

@jackwish Addressed your comments by going to a int32 for addition and then casting back when necessary. Also, added test cases.

zhenhuaw-me

LGTM.

As I am not community reviewer, you may need someone else to approve, my comments are only comments. :)

zhenhuaw-me · 2019-08-12T02:38:55Z

python/tvm/relay/qnn/op/qnn.py

+    # output qnn params. The add op is done in int32 precision.
+
+    if lhs_scale == rhs_scale and lhs_zero_point == rhs_zero_point:
+        lhs = relay.cast(lhs, dtype='int32')


add/sub in int16 should be enough, but not a big deal :)

Tried that :) Currently, it fails before requantize input can only be (uint8, int8, int32). I think for now, it should be ok. If we see more demand of int16, we can add support across all the QNN ops.

Yes, that is far from a blocking issue :) Thank you.

zhenhuaw-me · 2019-08-12T02:39:39Z

tests/python/relay/test_qnn_add.py

+    y_datas = [np.array((204, 178, 165, 140)).reshape((1,4)),
+               np.array((204, 178, 191, 25)).reshape((1,4)),
+               np.array((204, 178, 25, 191)).reshape((1,4))]
+    golden_outputs = [np.array((217,204,203,191)).reshape((1, 4)),


A bit curious, are these data coming from TFLite computing results, or manually computed?

TFLite adds quantize and dequantize. I am trapping the numbers after they have been quantized and before they are getting dequantized. So, these are from TFLite but not the actual GTest that you see, but the internals of it.

Yes, thank you for the explanation.

leo-blonk · 2019-08-12T13:34:47Z

It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator.
Do we know what the quantization parameters are at each point in the Relay graph?

anijain2305 · 2019-08-12T16:02:34Z

It seems that the normalisation of the quantisation parameters would be the same for a number of different operators. If that is the case, does it make sense to factor this out and maybe put this in a pass? That would avoid having to implement this for each operator.
Do we know what the quantization parameters are at each point in the Relay graph?

@Leo-arm this is a very good point. For background, there are 2 parallel efforts in TVM community right now

Automatic Quantization - Quantization in Relay. FP32 model comes in and Relay pass quantizes it.
Supporting Pre-quantized models - The quantization happens outside of Relay and TVM. (This PR belongs to this category). Here, the main goal is to create new QNN ops that will be converted to existing Relay ops.

We can share the HW schedules between these two options.

What you suggested is almost what happens today in the Automatic quantization project. It is also somewhat easier there because Automatic quantization only works with symmetric quantization.

For doing this in pre-quantized models, it is somewhat tricky because it happens on an op-by-op basis (requantize happens before in concatenate, but it happens later in conv2d. The zero point handling is different for add vs multiply). But, there are certainly some patterns that can be exploited to make it easier. For example, we can have a helper function that requantizes the inputs and then calls the main op, and all the QNN ops that follow this pattern can rely on the helper function. For now, this seems like a good intermediate point. As we get more ops and pattens, we can refactor those using helper functions.

u99127 · 2019-08-13T08:18:56Z

Assume below that ip0, ip1 and op are all 8 bit tensors with identical zero points and all the other cases.

{code}
op = add (ip0, ip1);
{code}

now gets lowered into :

{code}
tmp1 = recast (ip0, 32 bits)
tmp2 = recast (ip1, 32 bits)
tmp3 = add (tmp1, tmp2)
op = requantize (tmp3)
{code}

Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored ?

anijain2305 · 2019-08-13T16:44:47Z

Am I right in assuming that the tflite parser directly lowers to this level ? Is there any reason why the alternate option of having 8 bit tensor operations in relay has been ignored?

If the original op did not have any quantization params, if It was just plain int8 op, we will directly call relay.add instead of qnn.add.
In case of original op with quantization params, I would still like to cast. Since in quantization, the integers are just a proxy for FP32 numbers, overflow (with just int8) can lead to a bad proxy FP32 number. So, I have opted for a saturating behavior (manifested by requantize).

anijain2305 · 2019-08-15T00:29:44Z

@u99127 Please review again to ensure that your questions are answered :)

@vinx13 Ping for review.

anijain2305 · 2019-08-15T19:16:40Z

Do not submit yet. Will move the codebase to C++ to avoid calling InferType.

anijain2305 · 2019-08-22T19:28:19Z

Moved to C++. Removed WIP tag

zhenhuaw-me

Sorry for the delayed re-checking, I have been working on some other directions. Seems rebase is needed as #3819 has been merged :)

zhenhuaw-me · 2019-09-02T02:58:29Z

include/tvm/relay/qnn/attrs.h

@@ -97,6 +97,64 @@ struct DequantizeAttrs : public tvm::AttrsNode<DequantizeAttrs> {
  }
 };

+/*! \brief Attributes used in QNN concatenate operators */
+struct QnnConcatenateAttrs : public tvm::AttrsNode<QnnConcatenateAttrs> {


It seems to me that this part code has been merged in #3819

anijain2305 · 2019-09-04T06:45:32Z

@jackwish Can you please review? I have rebased to master.

anijain2305 · 2019-09-04T17:14:34Z

@zhiics @vinx13 Can you please review when you get time.

zhiics

It looks good to me. @jackwish @u99127 could you take another look when you are available?

zhenhuaw-me

LGTM generally, minor comments which won't block merging :)

zhenhuaw-me · 2019-09-05T02:23:00Z

src/relay/qnn/op/add.cc

+
+  // FIXME (anijain2305) - The lowering can be further optimized. Instead of inserting requantize in
+  // the start, we can insert requantize at the end if and only if all the input tensors have same
+  // qnn params. This can be done in future.


I guess that same scale can lead to requantize after ADD, zero point can be safely subtracted :)

Let me change the comment.

zhenhuaw-me · 2019-09-05T02:24:32Z

src/relay/qnn/op/add.cc

+  }
+
+  // Upcast to maintain precision.
+  requantized_lhs = Cast(requantized_lhs, Int(32));


It seems that result of two int8 subtracting can be hold in int16? But not big deal :)

Yes, currently Requantize does not support Int16. So, we can skip it for now. If we see int16 need later on, we can start supporting it across all ops.

zhiics

LGTM

zhiics · 2019-09-05T17:22:49Z

Thanks @anijain2305 @jackwish @u99127, this is now merged.

anijain2305 force-pushed the qnn_add branch from dae0001 to 5fd6c83 Compare August 8, 2019 17:34

anijain2305 force-pushed the qnn_add branch from 5fd6c83 to 781590d Compare August 8, 2019 20:22

u99127 suggested changes Aug 8, 2019

View reviewed changes

anijain2305 force-pushed the qnn_add branch from 781590d to 6a9b97d Compare August 8, 2019 22:56

anijain2305 force-pushed the qnn_add branch from 6a9b97d to e3cfe49 Compare August 9, 2019 20:40

anijain2305 force-pushed the qnn_add branch from e3cfe49 to c4c4d90 Compare August 9, 2019 20:48

zhenhuaw-me reviewed Aug 10, 2019

View reviewed changes

anijain2305 force-pushed the qnn_add branch from c4c4d90 to a1a94f2 Compare August 12, 2019 02:19

zhenhuaw-me approved these changes Aug 12, 2019

View reviewed changes

anijain2305 force-pushed the qnn_add branch 2 times, most recently from 64311a1 to c4d3199 Compare August 12, 2019 03:08

anijain2305 changed the title ~~[QNN] Add operator~~ [QNN] WIP - Add operator Aug 15, 2019

anijain2305 force-pushed the qnn_add branch 3 times, most recently from d85a4aa to b6a679d Compare August 22, 2019 19:25

anijain2305 changed the title ~~[QNN] WIP - Add operator~~ [QNN] Add operator Aug 22, 2019

anijain2305 force-pushed the qnn_add branch from b6a679d to 96d1b01 Compare August 23, 2019 06:38

zhiics mentioned this pull request Aug 23, 2019

[Relay] Refactor - Move infer types to a header file. #3783

Merged

anijain2305 force-pushed the qnn_add branch from 96d1b01 to fe53949 Compare August 27, 2019 01:06

jroesch added status: need review status: need update need update based on feedbacks labels Sep 1, 2019

zhenhuaw-me suggested changes Sep 2, 2019

View reviewed changes

anijain2305 force-pushed the qnn_add branch 2 times, most recently from 2a60ea6 to 65c2eec Compare September 4, 2019 06:44

anijain2305 force-pushed the qnn_add branch 2 times, most recently from 2dcdb05 to a1e0fc2 Compare September 4, 2019 17:13

anijain2305 force-pushed the qnn_add branch from a1e0fc2 to b44dd63 Compare September 4, 2019 17:41

zhiics reviewed Sep 5, 2019

View reviewed changes

zhenhuaw-me approved these changes Sep 5, 2019

View reviewed changes

[QNN] Add - Refactoring to C++

40f8b09

anijain2305 force-pushed the qnn_add branch from b44dd63 to 40f8b09 Compare September 5, 2019 04:54

zhiics approved these changes Sep 5, 2019

View reviewed changes

zhiics merged commit a6bb84a into apache:master Sep 5, 2019

MarisaKirisame pushed a commit to MarisaKirisame/tvm that referenced this pull request Sep 7, 2019

[QNN] Add - Refactoring to C++ (apache#3736)

9a61032

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

[QNN] Add - Refactoring to C++ (apache#3736)

8d8c893

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

[QNN] Add - Refactoring to C++ (apache#3736)

0c74cf0

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019

[QNN] Add - Refactoring to C++ (apache#3736)

00e666a

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

anijain2305 deleted the qnn_add branch November 13, 2019 00:31

[QNN] Add operator #3736

[QNN] Add operator #3736

Conversation

anijain2305 commented Aug 8, 2019 • edited Loading

u99127 commented Aug 8, 2019

u99127 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Aug 8, 2019

anijain2305 commented Aug 8, 2019

u99127 commented Aug 8, 2019

anijain2305 commented Aug 9, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Aug 12, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leo-blonk commented Aug 12, 2019

anijain2305 commented Aug 12, 2019 • edited Loading

u99127 commented Aug 13, 2019

anijain2305 commented Aug 13, 2019

anijain2305 commented Aug 15, 2019

anijain2305 commented Aug 15, 2019

anijain2305 commented Aug 22, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Sep 4, 2019

anijain2305 commented Sep 4, 2019

zhiics left a comment

Choose a reason for hiding this comment

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics left a comment

Choose a reason for hiding this comment

zhiics commented Sep 5, 2019

anijain2305 commented Aug 8, 2019 •

edited

Loading

anijain2305 commented Aug 12, 2019 •

edited

Loading