-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading Vector256/512 Shuffle() with VBMI support #87083
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsUpgrading Vector256/512 Shuffle() with VBMI support
|
bd850a2
to
5885bac
Compare
4bdbd2a
to
8eecf8f
Compare
There's currently an infrastructure issue causing a large number of CI machine timeouts. It's likely not related to your PR. |
Please let me know you there is anything else to be done on this to be merged. |
@@ -23482,6 +23509,15 @@ GenTree* Compiler::gtNewSimdShuffleNode( | |||
// swap the operands to match the encoding requirements | |||
retNode = gtNewSimdHWIntrinsicNode(type, op2, op1, NI_AVX512BW_PermuteVar32x16, simdBaseJitType, simdSize); | |||
} | |||
else if (elementSize == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's potentially a missing "todo" here to handle this using pshufb
if we don't cross lanes and VBMI
is not available.
I don't think it needs to be handled as part of this PR (or even for .NET 8 necessarily) since that scenario should only be limited to pre-cannon lake (and therefore first generation AVX-512 hardware). But, since it should just be refactoring the logic under the if (simdSize == 32)
path to be shared; its probably worth having still.
CC. @dotnet/jit-contrib, @dotnet/avx512-contrib for secondary sign-off. |
3a2e180
to
17fb7dc
Compare
This change does the following
1: Accelerates Vector512.Shuffle() using AVX512VBMI(permb)
2. Accelerates Vector256.Shuffle() using AVX512VBMI_VL
3. Accelerates Vector256.Shuffle() using AVX512BW_VL
@dotnet/avx512-contrib