-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Sve.Splice() #103567
Add support for Sve.Splice() #103567
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
@a74nh @kunalspathak @dotnet/arm64-contrib @arch-arm64-sve |
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
2e693e6
to
55a3f28
Compare
Some of the stress tests are failing when it fails read mask correctly that may lead to loading an Op2 using a mask incorrectly (with all zeros) or mask itself is incorrect. It seems like a known issue, happy to debug it further if you think otherwise. Stress test results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, changed related to ReverseBits
are straight forward and just need an entry in hwintrinsiclistarm64sve.h
table. The other changes are to support Splice
whose op2
is the preferred target, yes?
Probably worth sending a separate PR for ReveseBits
and Splice
because the logic on how they are handled are very different.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Show resolved
Hide resolved
Put ReverseBits in #103806 |
Regarding using a constructive variant. Not sure how to mark explicit op2 and op3 as needing consecutive registers. The current intrinsics with consecutive registers have a tuple in a single op. Here we have two separate args in the API. |
I might have to spend some time in figuring out how to make sure that when |
@SwapnilGaikwad - can you confirm if all the stress test passes with your recent change? Also, can you please share disassembly for one of the test? I will look deeply, but are we making sure that we are never generating constructive form? |
Some of the stress tests failed as before. They are loading the mask incorrectly as all zeros that is then leading to a wrong results. Stress test results
Disassembly for RunBasicScenario_Load
We have RMW semantics and we set the tgtPrefUse2. However, we don't have an explicit assert to ensure that we are emitting an instruction of group |
Is it possible to add it? |
Well, we kind of do it indirectly. We emit a constructive variant when |
May be |
Add a link to #103850 in comment |
That would have to be inside the
In addition, all of this is done in common codegen code. We could forcibly set |
It will be still inside the individual case like https://github.com/dotnet/runtime/blob/e1efa6b633d01c00ec019a8e222b2ebdfa593bcb/src/coreclr/jit/emitarm64sve.cpp#L3861C28-L3861C62 and not in common code path.
I am ok to comment out the relevant unit test, because we are not supporting it currently. |
So perhaps something like |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Contribute towards #99957.
This PR contains
Sve.Splice()
. It's the first one with an explicit mask and RMW semantics.