-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Added Sve.CreateBreakPropagateMask #104704
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
@dotnet/arm64-contrib @kunalspathak @tannergooding this is ready.
|
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs
Outdated
Show resolved
Hide resolved
Overall this looks good/correct to me. There's a couple small nits, some callouts around pre-existing code that doesn't look to be quite correct, and then one real case that I'm not sure is correctly handled (as |
@tannergooding Did feedback. Stress results:
|
I might have asked this earlier somewhere, but BRKN operates on
Currently with other overloads, the instruction will calculate the result in |
I elaborated on this the other place you asked (see #104502 (comment)). Having one for each type is correct and still fits into how predicates actually work. For x64 we have a
So for x64, a mask of While for Arm64 we have a
So there is no conversion required between these to get something "usable". There may, however, be some normalization consideration in some edge cases. For an instruction like this the mask returned despite being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some questions.
@@ -58,6 +58,7 @@ HARDWARE_INTRINSIC(Sve, CreateBreakAfterMask, | |||
HARDWARE_INTRINSIC(Sve, CreateBreakAfterPropagateMask, -1, 3, true, {INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen) | |||
HARDWARE_INTRINSIC(Sve, CreateBreakBeforeMask, -1, 2, true, {INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen) | |||
HARDWARE_INTRINSIC(Sve, CreateBreakBeforePropagateMask, -1, 3, true, {INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen) | |||
HARDWARE_INTRINSIC(Sve, CreateBreakPropagateMask, -1, -1, false, {INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_sve_brkn, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_HasRMWSemantics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should't this have HW_Flag_ExplicitMaskedOperation
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the above comment, I can't combine ExplicitMaskedOperation and EmbeddedMaskedOperation.
src/coreclr/jit/lsraarm64.cpp
Outdated
assert(intrin.op3->IsVectorZero()); | ||
tgtPrefUse = BuildUse(embOp2Node->Op(2)); | ||
srcCount += 1; | ||
srcCount += BuildDelayFreeUses(embOp2Node->Op(1), embOp2Node->Op(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am little worried about the handling of this. Can you share the JitDump of one of the simpler repro scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll get a JitDump of the LoadBasicScenario.
What specifically are you worried about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we are not building refpositions in order. Basically it should be for Op(1)
, Op(2)
and then Op(3)
, but here, we are doing Op(2)
, Op(1)
and Op(3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have fixed this properly and all tests are passing.
src/coreclr/jit/lsraarm64.cpp
Outdated
{ | ||
assert(!tgtPrefOp1); | ||
srcCount += BuildDelayFreeUses(intrin.op1, intrin.op2, predMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I just realized that in #104002 this should have been:
srcCount += BuildDelayFreeUses(intrin.op1, intrin.op2, predMask); | |
srcCount += BuildDelayFreeUses(intrin.op2, intrin.op1, predMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should actually be fine? intrin.op2
is the RMW node, and BuildDelayFreeUses' second parameter is for the RMW node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if you see the summary docs of BuildDelayFreeUses()
, node
is the node for which we want to create the RefPosition
for, in this case intrin.op2
and rmwNode
is the node for which we want to make sure that the register doesn't collide with, in this case, it is intrin.op1
. See some examples of places where we call BuildDelayFreeUses()
.
//------------------------------------------------------------------------
// BuildDelayFreeUses: Build Use RefPositions for an operand that might be contained,
// and which may need to be marked delayRegFree
//
// Arguments:
// node - The node of interest
// rmwNode - The node that has RMW semantics (if applicable)
// candidates - The set of candidates for the uses
// useRefPositionRef - If a use RefPosition is created, returns it. If none created, sets it to nullptr.
//
// REVIEW: useRefPositionRef is not consistently set. Also, sometimes this function creates multiple RefPositions
// but can only return one. Does it matter which one gets returned?
//
// Return Value:
// The number of source registers used by the *parent* of this node.
//
int LinearScan::BuildDelayFreeUses(GenTree* node,
GenTree* rmwNode,
SingleTypeRegSet candidates,
RefPosition** useRefPositionRef)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make that change, I get the assertion:
Errors:
Assert failure(PID 15364 [0x00003c04], Thread: 6432 [0x1920]): Assertion failed '!"removeListNode didn't find the node"' in 'JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateMaskForFirstActiveElement_sbyte:RunBasicScenario_UnsafeRead():this' during 'Linear scan register alloc' (IL size 110; hash 0xd48bee8e; FullOpts)
File: C:\work\runtime\src\coreclr\jit\lsrabuild.cpp:69
Image: C:\Users\dotnetperfuser\Desktop\work\windows.arm64.Checked\Tests\Core_Root\corerun.exe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Contributes to #99957
Adds