Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Sve.TransposeEven/Odd() #103068

Merged

Conversation

SwapnilGaikwad
Copy link
Contributor

Contribute towards #99957.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@SwapnilGaikwad
Copy link
Contributor Author

@a74nh @kunalspathak @dotnet/arm64-contrib @arch-arm64-sve

@SwapnilGaikwad
Copy link
Contributor Author

While running stress tests with JitStress=1, the RunBasicScenario_Load fails without tiered compilation when it cannot load a mask correctly from stack. I suspect, this is a known issue. If not, I'll debug this further.

Stress test results
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_float() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_double() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_ulong() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_float() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_double() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
Test failed:
..........................................
..........................................
Sve.TransposeEven<Single>(Vector<Single>, Vector<Single>): RunBasicScenario_Load failed:
    left: (0.1973199, 0.5963406, 0.40039855, 0.7560935)
   right: (0.9296538, 0.008992763, 0.26302436, 0.26820645)
  result: (0.1973199, 0, 0.40039855, 0)
..........................................
System.Exception: One or more scenarios did not complete as expected.
   at JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_float() in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/SveTransposeEven.float.cs:line 62
   at Program.<<Main>$>g__TestExecutor3321|0_3322(StreamWriter tempLogSw, StreamWriter statsCsvSw, <>c__DisplayClass0_0&) in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/generated/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/FullRunner.g.cs:line 83095
..........................................
Sve.TransposeEven<UInt16>(Vector<UInt16>, Vector<UInt16>): RunBasicScenario_Load failed:
    left: (37941, 2534, 46769, 4644, 27093, 27683, 54795, 43036)
   right: (13377, 10177, 34138, 62518, 48596, 12061, 51059, 37081)
  result: (37941, 0, 46769, 0, 27093, 0, 54795, 0)
..........................................
System.Exception: One or more scenarios did not complete as expected.
   at JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeEven_ushort() in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/SveTransposeEven.ushort.cs:line 62
   at Program.<<Main>$>g__TestExecutor3328|0_3329(StreamWriter tempLogSw, StreamWriter statsCsvSw, <>c__DisplayClass0_0&) in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/generated/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/FullRunner.g.cs:line 83263
..........................................
Sve.TransposeOdd<Double>(Vector<Double>, Vector<Double>): RunBasicScenario_Load failed:
    left: (0.33864708087114315, 0.03984851456367522)
   right: (0.06338899727774616, 0.08029362828860687)
  result: (0.03984851456367522, 0)
..........................................
System.Exception: One or more scenarios did not complete as expected.
   at JIT.HardwareIntrinsics.Arm._Sve.Program.SveTransposeOdd_double() in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/SveTransposeOdd.double.cs:line 62
   at Program.<<Main>$>g__TestExecutor3332|0_3333(StreamWriter tempLogSw, StreamWriter statsCsvSw, <>c__DisplayClass0_0&) in /home/user/dotnet/runtime/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/generated/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/FullRunner.g.cs:line 83359
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@kunalspathak
Copy link
Member

I suspect, this is a known issue.

Most likely.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added few questions, before I merge.

/// svuint32_t svtrn2[_u32](svuint32_t op1, svuint32_t op2)
/// TRN2 Zresult.S, Zop1.S, Zop2.S
/// </summary>
public static unsafe Vector<uint> TransposeOdd(Vector<uint> left, Vector<uint> right) => TransposeOdd(left, right);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but wonder for these type of APIs that has predicates version, e.g. TRN1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>, what happens when we do something like TransposeOdd(CreateTrueMask(), CreateTrueMask())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. As the masks are treated as regular vectors, it would act as using Vectors of 1s and 0s. I wonder if that's good enough for us while writing C# code. If someone want to operate on masks, they can just use the vector version and then use the result as a mask for the next instructions 🤔 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a74nh - any thoughts on this? There are many APIs that fall in this category.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code as it is, it's going to be converting to vectors, using the vector version, then converting back to mask. Which isn't ideal.

should be fairly easy to add some checks. If all inputs are all masks converted to vectors, then remove the convert to vectors. Then in codegen, if inputs are masks then use the mask versions.

We probably want an issue for this to track it. Then enable one by one.

Probably best to do this after implementing all the APIs so that we get all functionality done first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit ad51eea into dotnet:main Jun 5, 2024
162 of 167 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants