Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

Merged
merged 4 commits into from
Sep 9, 2021

Conversation

elvin-n
Copy link
Contributor

@elvin-n elvin-n commented Sep 1, 2021

  • Extend the list of different target for x86 topi
  • Extend tests for conv2d x86 int8 for fast i8 x86 platforms

this change in theory can give up to 2x speedup on int8 models vs fp32 models, currently slightly less

Resnet50 performance:

<style type="text/css"></style>
  Core i7-1185G7 sse4 Core i7-1185G7 avx2 Core i7-1185G7 avx512 Core i7-1185G7 VNNI Core i7-8700B Core i5-9400T
  FPS FPS FPS FPS FPS FPS
TVM FP32 53 53 53 54 48
TVM int32   12 16  
TVM int8 default 34 61 92 142 78 62
TVM int8 atvm   70   134 95 79

- Extend the list of different target for x86 topi
- Extend tests for conv2d x86 int8 for fast i8 x86 platforms
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answer! @elvin-n

@elvin-n
Copy link
Contributor Author

elvin-n commented Sep 3, 2021

The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size is 64 or 128 bit, but will affect the knowledge database of tuned kernels.

Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much.

@elvin-n
Copy link
Contributor Author

elvin-n commented Sep 3, 2021

I verified ARM flow and confirm that it started to use 4 channel values instead of 8 for blocking and this fact did not affect performance anyhow (as i expected)

@masahi masahi merged commit 1bebd0a into apache:main Sep 9, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…pache#8897)

* Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd

- Extend the list of different target for x86 topi
- Extend tests for conv2d x86 int8 for fast i8 x86 platforms

* fix code style

* Change x86-64-v2 to nahalem in test to support llvm11

* Change test target to get NCHW8c
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…pache#8897)

* Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd

- Extend the list of different target for x86 topi
- Extend tests for conv2d x86 int8 for fast i8 x86 platforms

* fix code style

* Change x86-64-v2 to nahalem in test to support llvm11

* Change test target to get NCHW8c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants