Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold const WithElement to CNS_VEC #86212

Merged
merged 15 commits into from
Jun 13, 2023
Merged

Fold const WithElement to CNS_VEC #86212

merged 15 commits into from
Jun 13, 2023

Conversation

jasper-d
Copy link
Contributor

@jasper-d jasper-d commented May 13, 2023

Closes #84543

VN ternary SIMD ops and fold const WithElement to CNS_VEC for base-type float (#84543).

I believe regressions are a result of different register allocation.

OSX x64 failure is a known issue (#86612). I assume SIGSEGV on Linux arm32 is unrelated.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 13, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 13, 2023
@ghost
Copy link

ghost commented May 13, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Fold const WithElement to CNS_VEC #84543.

@EgorBo Could you please take a look at this? Currently, it only handles TYP_FLOAT, but if the approach looks generally ok, I would try to generalize it to support other types as well.

Codegen now looks like this:

[MethodImpl(MethodImplOptions.NoInlining)]
internal static Vector4 Vector4Fields() => new Vector4 { X = 1, Y = 2, Z = 3, W = 4};
Emitting data sections: 16 total bytes

RWD00  	dq	400000003F800000h, 4080000040400000h
  section   0, size 16, RWD 0:	00 00 80 3f 00 00 00 40 00 00 40 40 00 00 80 40 

Allocated method code size =   19 , actual size =   19, unused size =    0

*************** After end code gen, before unwindEmit()
G_M24095_IG01:        ; func=00, offs=000000H, size=0003H, bbWeight=1, PerfScore 1.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN0004: 000000 vzeroupper 

G_M24095_IG02:        ; offs=000003H, size=000FH, bbWeight=1, PerfScore 5.25, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, BB01 [0000], byref

IN0001: 000003 vmovups  xmm0, xmmword ptr [reloc @RWD00]
IN0002: 00000B vmovups  xmmword ptr [rcx], xmm0
IN0003: 00000F mov      rax, rcx

G_M24095_IG03:        ; offs=000012H, size=0001H, bbWeight=1, PerfScore 1.00, epilog, nogc, extend

IN0005: 000012 ret      
Author: jasper-d
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jasper-d
Copy link
Contributor Author

jasper-d commented May 13, 2023

Failures are related, will look into them tomorrow.

Whoopsie:

[000018] ----------- morphing withelement    *  HWINTRINSIC simd16 float WithElement
[000007] -----+-----                         +--*  CNS_VEC   simd8 <0x3f86872b, 0x40033333>
[000017] -----+-----                         +--*  CNS_INT   int    2
[000010] -----+-----                         \--*  CNS_DBL   float  3.4779999256134033
[000007] -----+----- morphed into this       *  CNS_VEC   simd8 <0x3f86872b, 0x40033333>

@EgorBo
Copy link
Member

EgorBo commented May 13, 2023

but if the approach looks generally ok,

I think you need to generalize it for any type, not just float/double. overall the approach looks good. Ideally we'd want to do this in VN but it needs extra efforts (we don't number intrinsics with more than 2 args).

And the failing test yes, you need to check source vector's type

@EgorBo
Copy link
Member

EgorBo commented May 14, 2023

Actually, no, this should be handled in VN phase (easier and can handle more cases), here is a quick prototype you neeed to extend with type checks, etc:

--- a/src/coreclr/jit/valuenum.cpp
+++ b/src/coreclr/jit/valuenum.cpp
@@ -11356,8 +11356,37 @@ void Compiler::fgValueNumberHWIntrinsic(GenTreeHWIntrinsic* tree)
     ValueNumPair excSetPair = ValueNumStore::VNPForEmptyExcSet();
     ValueNumPair normalPair = ValueNumPair();
 
-    if ((tree->GetOperandCount() > 2) || ((JitConfig.JitDisableSimdVN() & 2) == 2))
+    const bool disableSimdVN = (JitConfig.JitDisableSimdVN() & 2) == 2;
+    if ((tree->GetOperandCount() > 2) || disableSimdVN)
     {
+        if (!disableSimdVN && intrinsicId == NI_Vector128_WithElement)
+        {
+            assert(tree->GetOperandCount() == 3);
+            GenTree* op1 = tree->Op(1);
+            GenTree* op2 = tree->Op(2);
+            GenTree* op3 = tree->Op(3);
+            if (op1->gtVNPair.BothEqual() && vnStore->IsVNConstant(op1->gtVNPair.GetLiberal()))
+            {
+                if (op2->gtVNPair.BothEqual() && vnStore->IsVNConstant(op2->gtVNPair.GetLiberal()))
+                {
+                    if (op3->gtVNPair.BothEqual() && vnStore->IsVNConstant(op3->gtVNPair.GetLiberal()))
+                    {
+                        ValueNum constVecVN  = op1->gtVNPair.GetLiberal();
+                        ValueNum elemIndexVN = op2->gtVNPair.GetLiberal();
+                        ValueNum elemValueVN = op3->gtVNPair.GetLiberal();
+
+                        simd16_t constVec = vnStore->GetConstantSimd16(constVecVN);
+                        constVec.f32[vnStore->GetConstantInt32(elemIndexVN)] = vnStore->GetConstantSingle(elemValueVN);
+
+                        ValueNum newSimdVN = vnStore->VNForSimd16Con(constVec);
+                        tree->gtVNPair = vnStore->VNPWithExc(ValueNumPair(newSimdVN, newSimdVN), excSetPair);
+                        return;
+                    }
+                }
+            }
+        }
+
+
         // TODO-CQ: allow intrinsics with > 2 operands to be properly VN'ed.
         normalPair = vnStore->VNPairForExpr(compCurBB, tree->TypeGet());

@jasper-d
Copy link
Contributor Author

here is a quick prototype you neeed to extend with type checks, etc

Thank you! To clarify, it would be ok to special case WithElement like you did and not resolve

// TODO-CQ: allow intrinsics with > 2 operands to be properly VN'ed.
?

Because that looks rather complicated.

@EgorBo
Copy link
Member

EgorBo commented May 14, 2023

here is a quick prototype you neeed to extend with type checks, etc

Thank you! To clarify, it would be ok to special case WithElement like you did and not resolve

// TODO-CQ: allow intrinsics with > 2 operands to be properly VN'ed.

?
Because that looks rather complicated.

Yes, special casing is fine in this case, to enable proper numbering for args>2 we need to check what kind of intrinsics we'll work with. Bur, presumably, it should be beneficial and not complicated

@EgorBo
Copy link
Member

EgorBo commented May 20, 2023

@jasper-d hint: if diffs won't find a lot of diffs and all of them about Vector2/3/4 with float fields - I think it will be fine to simplify the impl to only handle those since your current impl could be a bit an overkill if we only deal with floats

@jasper-d
Copy link
Contributor Author

@jasper-d jasper-d marked this pull request as ready for review May 23, 2023 12:14
@jasper-d jasper-d changed the title [WIP] Fold const WithElement to CNS_VEC Fold const WithElement to CNS_VEC May 23, 2023
@jasper-d
Copy link
Contributor Author

jasper-d commented Jun 8, 2023

@EgorBo PTAL

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Co-authored-by: Egor Bogatov <egorbo@gmail.com>
@jasper-d
Copy link
Contributor Author

LGTM, thanks!

Thank you for your help!

@EgorBo EgorBo merged commit a052348 into dotnet:main Jun 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize new Vector3 { X=,Y=,Z=} pattern
2 participants