-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897
Conversation
- Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your answer! @elvin-n
The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size is 64 or 128 bit, but will affect the knowledge database of tuned kernels. Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much. |
I verified ARM flow and confirm that it started to use 4 channel values instead of 8 for blocking and this fact did not affect performance anyhow (as i expected) |
…pache#8897) * Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd - Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms * fix code style * Change x86-64-v2 to nahalem in test to support llvm11 * Change test target to get NCHW8c
…pache#8897) * Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd - Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms * fix code style * Change x86-64-v2 to nahalem in test to support llvm11 * Change test target to get NCHW8c
this change in theory can give up to 2x speedup on int8 models vs fp32 models, currently slightly less
Resnet50 performance:
<style type="text/css"></style>