-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fold const WithElement to CNS_VEC #86212
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFold const WithElement to CNS_VEC #84543. @EgorBo Could you please take a look at this? Currently, it only handles TYP_FLOAT, but if the approach looks generally ok, I would try to generalize it to support other types as well. Codegen now looks like this: [MethodImpl(MethodImplOptions.NoInlining)]
internal static Vector4 Vector4Fields() => new Vector4 { X = 1, Y = 2, Z = 3, W = 4}; Emitting data sections: 16 total bytes
RWD00 dq 400000003F800000h, 4080000040400000h
section 0, size 16, RWD 0: 00 00 80 3f 00 00 00 40 00 00 40 40 00 00 80 40
Allocated method code size = 19 , actual size = 19, unused size = 0
*************** After end code gen, before unwindEmit()
G_M24095_IG01: ; func=00, offs=000000H, size=0003H, bbWeight=1, PerfScore 1.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
IN0004: 000000 vzeroupper
G_M24095_IG02: ; offs=000003H, size=000FH, bbWeight=1, PerfScore 5.25, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, BB01 [0000], byref
IN0001: 000003 vmovups xmm0, xmmword ptr [reloc @RWD00]
IN0002: 00000B vmovups xmmword ptr [rcx], xmm0
IN0003: 00000F mov rax, rcx
G_M24095_IG03: ; offs=000012H, size=0001H, bbWeight=1, PerfScore 1.00, epilog, nogc, extend
IN0005: 000012 ret
|
Failures are related, will look into them tomorrow. Whoopsie:
|
I think you need to generalize it for any type, not just float/double. overall the approach looks good. Ideally we'd want to do this in VN but it needs extra efforts (we don't number intrinsics with more than 2 args). And the failing test yes, you need to check source vector's type |
Actually, no, this should be handled in VN phase (easier and can handle more cases), here is a quick prototype you neeed to extend with type checks, etc: --- a/src/coreclr/jit/valuenum.cpp
+++ b/src/coreclr/jit/valuenum.cpp
@@ -11356,8 +11356,37 @@ void Compiler::fgValueNumberHWIntrinsic(GenTreeHWIntrinsic* tree)
ValueNumPair excSetPair = ValueNumStore::VNPForEmptyExcSet();
ValueNumPair normalPair = ValueNumPair();
- if ((tree->GetOperandCount() > 2) || ((JitConfig.JitDisableSimdVN() & 2) == 2))
+ const bool disableSimdVN = (JitConfig.JitDisableSimdVN() & 2) == 2;
+ if ((tree->GetOperandCount() > 2) || disableSimdVN)
{
+ if (!disableSimdVN && intrinsicId == NI_Vector128_WithElement)
+ {
+ assert(tree->GetOperandCount() == 3);
+ GenTree* op1 = tree->Op(1);
+ GenTree* op2 = tree->Op(2);
+ GenTree* op3 = tree->Op(3);
+ if (op1->gtVNPair.BothEqual() && vnStore->IsVNConstant(op1->gtVNPair.GetLiberal()))
+ {
+ if (op2->gtVNPair.BothEqual() && vnStore->IsVNConstant(op2->gtVNPair.GetLiberal()))
+ {
+ if (op3->gtVNPair.BothEqual() && vnStore->IsVNConstant(op3->gtVNPair.GetLiberal()))
+ {
+ ValueNum constVecVN = op1->gtVNPair.GetLiberal();
+ ValueNum elemIndexVN = op2->gtVNPair.GetLiberal();
+ ValueNum elemValueVN = op3->gtVNPair.GetLiberal();
+
+ simd16_t constVec = vnStore->GetConstantSimd16(constVecVN);
+ constVec.f32[vnStore->GetConstantInt32(elemIndexVN)] = vnStore->GetConstantSingle(elemValueVN);
+
+ ValueNum newSimdVN = vnStore->VNForSimd16Con(constVec);
+ tree->gtVNPair = vnStore->VNPWithExc(ValueNumPair(newSimdVN, newSimdVN), excSetPair);
+ return;
+ }
+ }
+ }
+ }
+
+
// TODO-CQ: allow intrinsics with > 2 operands to be properly VN'ed.
normalPair = vnStore->VNPairForExpr(compCurBB, tree->TypeGet()); |
Thank you! To clarify, it would be ok to special case WithElement like you did and not resolve runtime/src/coreclr/jit/valuenum.cpp Line 11361 in f107b63
Because that looks rather complicated. |
Yes, special casing is fine in this case, to enable proper numbering for args>2 we need to check what kind of intrinsics we'll work with. Bur, presumably, it should be beneficial and not complicated |
@jasper-d hint: if diffs won't find a lot of diffs and all of them about Vector2/3/4 with float fields - I think it will be fine to simplify the impl to only handle those since your current impl could be a bit an overkill if we only deal with floats |
Looks like https://github.com/dotnet/runtime/blob/main/src/tests/JIT/HardwareIntrinsics/X86/Regression/GitHub_17957/GitHub_17957.cs is the only diff for non-float. Will change. |
@EgorBo PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Co-authored-by: Egor Bogatov <egorbo@gmail.com>
Thank you for your help! |
Closes #84543
VN ternary SIMD ops and fold const WithElement to CNS_VEC for base-type float (#84543).
I believe regressions are a result of different register allocation.
OSX x64 failure is a known issue (#86612). I assume SIGSEGV on Linux arm32 is unrelated.