-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Optimize const ShiftRightLogical for byte values on XArch #86841
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsContributes to #82564 (First time touching the JIT so I don't really know what I'm doing) Vector128<byte> v0;
// Expands patterns like
v0 >>> 4;
// to
(v0.AsInt32() >>> 4).AsByte() & Vector128.Create((byte)15);
|
src/coreclr/jit/hwintrinsicxarch.cpp
Outdated
if (varTypeIsByte(simdBaseType) && !impStackTop(0).val->IsCnsIntOrI()) | ||
{ | ||
// byte and sbyte would require more work to support | ||
// non-constant byte and sbyte would require more work to support | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need this limitation. For shift in particular, working with constant
or non-constant
data is the same since we have overloads that take byte shiftAmount
and overloads that take Vector128<T> shiftAmount
with the latter simply being Vector128.CreateScalarUnsafe(shiftAmount)
of the non-constant byte
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That handling of constant
vs non-constant
is general and simply part of the existing gtNewSimdBinOp
handling, so you shouldn't need to do anything additional really.
We'll already have appropriately masked off the shift amount to be 0-7
and converted it to a Vector128
if necessary; so you should just need to generate the shift
followed by the and
to mask off the upper n-bits
of each element.
It basically just becomes that if its a constant, you can generate a new icon directly and otherwise the mask is gtNewSimdBroadcastNode
of 0xFF >> shiftAmount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it now to handle non-const shift amounts as well, but it feels a bit hacky with the temporary nonConstantByteShiftCountOp
- please let me know if I'm on the right track.
Right now the non-const path is something like
static Vector128<byte> Shift(Vector128<byte> input, int shiftAmount)
{
int maskedShiftAmount = shiftAmount & 7;
Vector128<int> shiftVector = Sse2.ConvertScalarToVector128Int32(maskedShiftAmount);
Vector128<byte> shiftedInput = Sse2.ShiftRightLogical(input.AsInt32(), shiftVector).AsByte();
return shiftedInput & Vector128.Create((byte)(255 >> maskedShiftAmount));
}
(don't mind the referenced issue spam - I'm just testing stuff and this PR is convenient) |
CC. @dotnet/jit-contrib, this needs a secondary review before merging. |
Test failures were #88582 |
Contributes to #82564
(First time touching the JIT so I don't really know what I'm doing)
Codegen for
Before
After
Codegen for
Before
After