-
Notifications
You must be signed in to change notification settings - Fork 287
Refactor avx512f #1597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor avx512f #1597
Conversation
82fb013
to
51da0ac
Compare
@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work. |
So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the |
1e0a0e0
to
c8fc6f2
Compare
@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other. |
Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt) |
can you also please do the floating-point abs using |
@sayantn done. |
Thanks. |
dd20b4f
to
9aae346
Compare
0007890
to
507cef8
Compare
I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants |
_MM_FROUND_CUR_DIRECTION
.