-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking the correct dtypes for choosing the Intel int8 instructions. #3516
Conversation
Please try also tag reviewers who you do not interact physically :) as per https://docs.tvm.ai/contribute/committer_guide.html?highlight=physically#broad-collaboration |
Good point @tqchen :) Will keep in mind from next time |
@llyfacebook and @jianyuh you might be interested in this. Please also tag other reviewers if you think they will be interested. |
Just curious. Why VNNI requires this? Like TFLite, data and kernel both are u8. If VNNI requires this, TFLite's quantized model can not get HW-supported instruction. |
Good question. I am not entirely sure about what Relay operations to perform. But, the key idea is to shift the zero point by subtracting/adding 128 to the quantized tensor values. There are lot of details here - https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training?_ga=2.113337095.1887331677.1562772854-237924077.1562647522 Will be using the above link to shift the zero points. |
Different hw backends has their own native instructions, and we might need a specific quantized model for each hw backends to maximize the perf. That is why the native relay quantization flow is super important and we should push for some of that |
@FrozenGene @tqchen @jianyuh @llyfacebook Ping for review |
one concern is do we need this restrict? For example, we pass if check_skylake(target):
int32_lanes = 16
else:
return s we check the data type here (for example in the function |
Intel VNNI and Skylake HW-supported int8 instructions require one tensor to be unsigned int8 and other to be signed int8. Adding more checks in the x86 topi to satisfy that requirement.
@FrozenGene Looked more into this. There are 2 reasons why I am hesitant to do it
Please let me know what you think. Regarding the transformation of kernel from 4D to 7D, we are already working on it. We plan to update this PR with that transformation as well this week. |
My previous context of assert error is here: https://github.com/dmlc/tvm/blob/4f10a90e3e2ec18071612bed657108b219bc6796/topi/python/topi/x86/conv2d.py#L278-L288 Current your PR restrict kernel dtype must be If we restrict if check_skylake(target):
int32_lanes = 16
else:
# new schedule for non-skylake int8 computation
#return s I find we only write the schedule for skylake haraware for int8, I think it doesn't make sense in fact. |
@FrozenGene I agree that just int8 for skylake sounds weird. But, it arises from the HW support. Skylake onwards, Intel is adding int8 HW support. We need to run tensorize to utilize that HW support as LLVM does not pick it up. If we dont use tensorize, the performance gets bad. Therefore, we have to fall back to NCHWc FP32 schedule. But, I agree with the general direction that you are proposing. I think in future as LLVM gets more mature for int8 Intel instructions, we can refine this schedule. |
I think it is not only about HW-supported int8 instructions but also no HW-supported int8 instructions, For example Haswell. I think it is ok to use Tensorize for skylake. My point is for no HW-supported int8 cpu architecture, we also should have one int8 schedule, not just |
- Ensures that we fall back to NCHWc when int8 support is absent. - Added test to check both cases - When HW support present and absent. - Verified that the performance of int8 conv.
topi/python/topi/x86/conv2d.py
Outdated
kernel_OHWoIie = F.reshape(kernel_OHWoIi, (out_channel//oc_bn, kh, kw, oc_bn, | ||
in_channel//ic_bn, ic_bn//n_elems, n_elems)) | ||
kernel_OIHWioe = F.transpose(kernel_OHWoIie, axes=(0, 4, 1, 2, 5, 3, 6)) | ||
copy_inputs = [data_func, kernel_OIHWioe] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to do dispatch_ctx.update
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. Added that now
@FrozenGene Updated the patch adapting to somewhat to what you said. Int8 is little hacky to work with. New Intel machines that have Int8 support perform reduction within a vector, requiring new compute, new alter conv2d layout and new schedule. So, we need to check int8 support at multiple places in the codebase. If int8 is not supported, there is not much we can do and falling back to FP32 schedule gives decent enough performance. As of now, I have tried my best to guard things properly so that we never go to int8 schedule if int8 support is absent. |
@anijain2305 Please address the pylint error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For Int8 NHWC pack, I am returning unoptimized schedule because of this bug - #3598 |
Intel VNNI and Skylake HW-supported int8 instructions require one tensor to be
unsigned int8 and other to be signed int8. Adding more checks in the x86 topi to
satisfy that requirement.
Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.