-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer Norm x86 SIMD Optimizations #4065
Conversation
…something strange in packing layout;
Codecov Report
@@ Coverage Diff @@
## master #4065 +/- ##
==========================================
+ Coverage 94.41% 94.43% +0.02%
==========================================
Files 745 748 +3
Lines 178496 179052 +556
==========================================
+ Hits 168533 169094 +561
+ Misses 9963 9958 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for simd register horizontal sum, there is utility function in x86_usability.h
for avx/fma multiply-add intrinsics, there is wrapper comp_fmadd function in x86_usability.h
use size * elempack
as the loop count when applicable, so you can merge multiple for loop code blocks into one
I think I do not need SIMD register horizontal summation because the length of tensor varies. But AVX/FMA fmadd wrappers in But I'm not sure how to merge multiple loop blocks into one by using |
suppose v is data from tensor, and a is the weight(such as alpha beta gamma etc.) pack1
pack4
pack8
pack16
unified pack
|
Thanks. Now I managed to merge many cases into one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add copyright header for new source
diff coverage is not good enough from https://app.codecov.io/gh/Tencent/ncnn/pull/4065 |
I've added some test cases about 16-packed tensors. But I'm confused about the diff coverage. Most files shown in https://app.codecov.io/gh/Tencent/ncnn/pull/4065 are not modified or even influenced by this PR. I have no idea that how |
It often fails in that way. |
Thanks for your contribution ! |
This PR provides some SIMD optimizations for
LayerNorm
, both for packed or unpacked tensors.