Checking the correct dtypes for choosing the Intel int8 instructions. #3516

anijain2305 · 2019-07-09T05:02:12Z

Intel VNNI and Skylake HW-supported int8 instructions require one tensor to be
unsigned int8 and other to be signed int8. Adding more checks in the x86 topi to
satisfy that requirement.

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

anijain2305 · 2019-07-09T05:02:27Z

@yzhliu @kevinthesun @rankyung-hong

tqchen · 2019-07-09T17:05:41Z

Please try also tag reviewers who you do not interact physically :) as per https://docs.tvm.ai/contribute/committer_guide.html?highlight=physically#broad-collaboration

anijain2305 · 2019-07-09T17:19:41Z

Good point @tqchen :) Will keep in mind from next time

anijain2305 · 2019-07-09T17:22:27Z

@llyfacebook and @jianyuh you might be interested in this. Please also tag other reviewers if you think they will be interested.

FrozenGene · 2019-07-10T11:39:24Z

Just curious. Why VNNI requires this? Like TFLite, data and kernel both are u8. If VNNI requires this, TFLite's quantized model can not get HW-supported instruction.

anijain2305 · 2019-07-10T15:35:05Z

Just curious. Why VNNI requires this? Like TFLite, data and kernel both are u8. If VNNI requires this, TFLite's quantized model can not get HW-supported instruction.

Good question. I am not entirely sure about what Relay operations to perform. But, the key idea is to shift the zero point by subtracting/adding 128 to the quantized tensor values. There are lot of details here - https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training?_ga=2.113337095.1887331677.1562772854-237924077.1562647522

Will be using the above link to shift the zero points.

tqchen · 2019-07-10T15:53:45Z

Different hw backends has their own native instructions, and we might need a specific quantized model for each hw backends to maximize the perf. That is why the native relay quantization flow is super important and we should push for some of that

anijain2305 · 2019-07-11T15:50:46Z

@FrozenGene @tqchen @jianyuh @llyfacebook Ping for review

FrozenGene · 2019-07-12T07:02:57Z

one concern is do we need this restrict? For example, we pass data / kernel u8 to x86, what will be happened? Current we will meet assert error. My prefer way maybe is in the _schedule_conv_nhwc_pack_int8

    if check_skylake(target):
        int32_lanes = 16
    else:
        return s

we check the data type here (for example in the function check_skylake) . Then we could make data / kernel u8 could work well. For example, when the x86 cpu doesn't have VNNI instruction.

Intel VNNI and Skylake HW-supported int8 instructions require one tensor to be unsigned int8 and other to be signed int8. Adding more checks in the x86 topi to satisfy that requirement.

anijain2305 · 2019-07-16T15:51:51Z

@FrozenGene Looked more into this. There are 2 reasons why I am hesitant to do it

In case of your suggestion, we will directly return default schedule i.e. tvm compute function. If we put the check outside as done in the PR, we can atleast use the schedule_conv_NCHWc schedule, which gives better performance than the default schedule.
VNNI requires kernel to be 7D (as opposed to the 6D in the case of NCHWc). This has to be done in alter_op_layout for conv. So, we need checks at multiple places.

Please let me know what you think.

Regarding the transformation of kernel from 4D to 7D, we are already working on it. We plan to update this PR with that transformation as well this week.

FrozenGene · 2019-07-17T02:56:19Z

@FrozenGene Looked more into this. There are 2 reasons why I am hesitant to do it

In case of your suggestion, we will directly return default schedule i.e. tvm compute function. If we put the check outside as done in the PR, we can atleast use the schedule_conv_NCHWc schedule, which gives better performance than the default schedule.

My previous context of assert error is here: https://github.com/dmlc/tvm/blob/4f10a90e3e2ec18071612bed657108b219bc6796/topi/python/topi/x86/conv2d.py#L278-L288 Current your PR restrict kernel dtype must be int8 for NHWC int8 computation.

If we restrict NCHWc, I looked more. I think it is ok, we could use FP32 schedule for it and should have better performance than current schedule. Personally I suggest we should write int8 schedule for non-skylake haraware in the future. i.e.

    if check_skylake(target):
        int32_lanes = 16
    else:
        # new schedule for non-skylake int8 computation
        #return s

I find we only write the schedule for skylake haraware for int8, I think it doesn't make sense in fact.

anijain2305 · 2019-07-17T07:02:56Z

@FrozenGene I agree that just int8 for skylake sounds weird. But, it arises from the HW support. Skylake onwards, Intel is adding int8 HW support. We need to run tensorize to utilize that HW support as LLVM does not pick it up. If we dont use tensorize, the performance gets bad. Therefore, we have to fall back to NCHWc FP32 schedule.

But, I agree with the general direction that you are proposing. I think in future as LLVM gets more mature for int8 Intel instructions, we can refine this schedule.

FrozenGene · 2019-07-17T07:17:30Z

@FrozenGene I agree that just int8 for skylake sounds weird. But, it arises from the HW support. Skylake onwards, Intel is adding int8 HW support. We need to run tensorize to utilize that HW support as LLVM does not pick it up. If we dont use tensorize, the performance gets bad. Therefore, we have to fall back to NCHWc FP32 schedule.

But, I agree with the general direction that you are proposing. I think in future as LLVM gets more mature for int8 Intel instructions, we can refine this schedule.

I think it is not only about HW-supported int8 instructions but also no HW-supported int8 instructions, For example Haswell. I think it is ok to use Tensorize for skylake. My point is for no HW-supported int8 cpu architecture, we also should have one int8 schedule, not just return s or use FP32 schedule.

- Ensures that we fall back to NCHWc when int8 support is absent. - Added test to check both cases - When HW support present and absent. - Verified that the performance of int8 conv.

yzhliu · 2019-07-20T01:14:38Z

topi/python/topi/x86/conv2d.py

+            kernel_OHWoIie = F.reshape(kernel_OHWoIi, (out_channel//oc_bn, kh, kw, oc_bn,
+                                                       in_channel//ic_bn, ic_bn//n_elems, n_elems))
+            kernel_OIHWioe = F.transpose(kernel_OHWoIie, axes=(0, 4, 1, 2, 5, 3, 6))
+            copy_inputs = [data_func, kernel_OIHWioe]


also need to do dispatch_ctx.update?

Ah, thanks. Added that now

anijain2305 · 2019-07-20T06:26:19Z

@FrozenGene Updated the patch adapting to somewhat to what you said.

Int8 is little hacky to work with. New Intel machines that have Int8 support perform reduction within a vector, requiring new compute, new alter conv2d layout and new schedule. So, we need to check int8 support at multiple places in the codebase. If int8 is not supported, there is not much we can do and falling back to FP32 schedule gives decent enough performance.

As of now, I have tried my best to guard things properly so that we never go to int8 schedule if int8 support is absent.

yzhliu · 2019-07-21T18:51:11Z

@anijain2305 Please address the pylint error.
@FrozenGene Could you review again?

tests/python/relay/test_op_level2.py

FrozenGene

LGTM.

anijain2305 · 2019-07-23T15:46:52Z

For Int8 NHWC pack, I am returning unoptimized schedule because of this bug - #3598

…apache#3516)

tqchen added the status: need review label Jul 11, 2019

Checking the correct dtypes for choosing the Intel int8 instructions.

66fbd42

Intel VNNI and Skylake HW-supported int8 instructions require one tensor to be unsigned int8 and other to be signed int8. Adding more checks in the x86 topi to satisfy that requirement.

anijain2305 force-pushed the fix_uint8 branch from 36e636e to 66fbd42 Compare July 16, 2019 15:36

Hong added 2 commits July 16, 2019 15:55

Add weight layout conversion from 4D to 7D

a5269e7

Add comments and modify variable names

4f10a90

anijain2305 force-pushed the fix_uint8 branch from 5489094 to 73789cb Compare July 20, 2019 00:14

Stricter check for int8 compute.

e62fccc

- Ensures that we fall back to NCHWc when int8 support is absent. - Added test to check both cases - When HW support present and absent. - Verified that the performance of int8 conv.

anijain2305 force-pushed the fix_uint8 branch from 73789cb to e62fccc Compare July 20, 2019 00:19

yzhliu approved these changes Jul 20, 2019

View reviewed changes

Adding dispatch ctx.

bfc418c

Fixing pylint error.

057fa8d

FrozenGene reviewed Jul 22, 2019

View reviewed changes

tests/python/relay/test_op_level2.py Show resolved Hide resolved

anijain2305 added 2 commits July 22, 2019 16:13

Adding one more test for Intel older generations. Fixing the top tests.

1b12be9

Removing the X86 dependent code from target agnostic nn/conv2d.py file.

a364f28

kevinthesun approved these changes Jul 22, 2019

View reviewed changes

FrozenGene approved these changes Jul 23, 2019

View reviewed changes

Int8 topi tests changes.

c6ff329

tqchen merged commit 3ada7c0 into apache:master Jul 23, 2019

tqchen added status: accepted and removed status: need review labels Jul 23, 2019

anijain2305 deleted the fix_uint8 branch July 23, 2019 17:06

jianyuh mentioned this pull request Aug 1, 2019

[RFC] Add AVX512VNNI support for TVM #3388

Merged

wweic pushed a commit to wweic/tvm that referenced this pull request Aug 9, 2019

Checking the correct dtypes for choosing the Intel int8 instructions. (…

a54beca

…apache#3516)

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019

Checking the correct dtypes for choosing the Intel int8 instructions. (…

f1d7708

…apache#3516)

jianyuh mentioned this pull request Oct 11, 2019

[DOCKER] Fix broken test environment for test_gemm_acc32_vnni.py #4097

Closed

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking the correct dtypes for choosing the Intel int8 instructions. #3516

Checking the correct dtypes for choosing the Intel int8 instructions. #3516

anijain2305 commented Jul 9, 2019

anijain2305 commented Jul 9, 2019 •

edited

Loading

tqchen commented Jul 9, 2019

anijain2305 commented Jul 9, 2019

anijain2305 commented Jul 9, 2019 •

edited

Loading

FrozenGene commented Jul 10, 2019

anijain2305 commented Jul 10, 2019

tqchen commented Jul 10, 2019 •

edited

Loading

anijain2305 commented Jul 11, 2019

FrozenGene commented Jul 12, 2019 •

edited

Loading

anijain2305 commented Jul 16, 2019

FrozenGene commented Jul 17, 2019 •

edited

Loading

anijain2305 commented Jul 17, 2019

FrozenGene commented Jul 17, 2019 •

edited

Loading

yzhliu Jul 20, 2019

anijain2305 Jul 20, 2019

anijain2305 commented Jul 20, 2019

yzhliu commented Jul 21, 2019

FrozenGene left a comment

anijain2305 commented Jul 23, 2019

Checking the correct dtypes for choosing the Intel int8 instructions. #3516

Checking the correct dtypes for choosing the Intel int8 instructions. #3516

Conversation

anijain2305 commented Jul 9, 2019

anijain2305 commented Jul 9, 2019 • edited Loading

tqchen commented Jul 9, 2019

anijain2305 commented Jul 9, 2019

anijain2305 commented Jul 9, 2019 • edited Loading

FrozenGene commented Jul 10, 2019

anijain2305 commented Jul 10, 2019

tqchen commented Jul 10, 2019 • edited Loading

anijain2305 commented Jul 11, 2019

FrozenGene commented Jul 12, 2019 • edited Loading

anijain2305 commented Jul 16, 2019

FrozenGene commented Jul 17, 2019 • edited Loading

anijain2305 commented Jul 17, 2019

FrozenGene commented Jul 17, 2019 • edited Loading

yzhliu Jul 20, 2019

Choose a reason for hiding this comment

anijain2305 Jul 20, 2019

Choose a reason for hiding this comment

anijain2305 commented Jul 20, 2019

yzhliu commented Jul 21, 2019

FrozenGene left a comment

Choose a reason for hiding this comment

anijain2305 commented Jul 23, 2019

anijain2305 commented Jul 9, 2019 •

edited

Loading

anijain2305 commented Jul 9, 2019 •

edited

Loading

tqchen commented Jul 10, 2019 •

edited

Loading

FrozenGene commented Jul 12, 2019 •

edited

Loading

FrozenGene commented Jul 17, 2019 •

edited

Loading

FrozenGene commented Jul 17, 2019 •

edited

Loading