Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Added SVE LoadVector*NonFaultingZeroExtendTo* APIs #102860

Merged
merged 5 commits into from
May 31, 2024

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented May 30, 2024

Contributes to #99957

Adds SVE APIs:

  • LoadVectorByteNonFaultingZeroExtendToInt16
  • LoadVectorByteNonFaultingZeroExtendToInt32
  • LoadVectorByteNonFaultingZeroExtendToInt64
  • LoadVectorByteNonFaultingZeroExtendToUInt16
  • LoadVectorByteNonFaultingZeroExtendToUInt32
  • LoadVectorByteNonFaultingZeroExtendToUInt64
  • LoadVectorUInt16NonFaultingZeroExtendToInt32
  • LoadVectorUInt16NonFaultingZeroExtendToInt64
  • LoadVectorUInt16NonFaultingZeroExtendToUInt32
  • LoadVectorUInt16NonFaultingZeroExtendToUInt64
  • LoadVectorUInt32NonFaultingZeroExtendToInt64
  • LoadVectorUInt32NonFaultingZeroExtendToUInt64

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@TIHan TIHan marked this pull request as ready for review May 30, 2024 02:29
@TIHan
Copy link
Contributor Author

TIHan commented May 30, 2024

@dotnet/arm64-contrib @kunalspathak this is ready. All tests pass with no assertions.

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToInt16() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToInt32() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToInt64() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToUInt16() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToUInt32() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteNonFaultingZeroExtendToUInt64() : 2
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16NonFaultingZeroExtendToInt32() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16NonFaultingZeroExtendToInt64() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16NonFaultingZeroExtendToUInt32() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16NonFaultingZeroExtendToUInt64() : 2
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32NonFaultingZeroExtendToInt64() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32NonFaultingZeroExtendToUInt64() : 2
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label May 30, 2024

// Validates calling via reflection works
// TODO-SVE: Enable once register allocation exists for predicates.
// test.RunReflectionScenario_Load();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a scenario for ConditionalSelect(mask, LoadVectorByteNonFaultingZeroExtendToInt64(address), mergeValue)? @a74nh ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that statement is still true. It may have been fixed by the better handling we now have of predicates. Does this test work if you enable it?
If so, could you also enable the same test in SveLoadMaskedUnOpTest.template too please.

@@ -496,7 +496,37 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
}
}

GetEmitter()->emitIns_R_R_R(insEmbMask, emitSize, targetReg, maskReg, embMaskOp1Reg, opt);
if (intrinEmbMask.codeGenIsTableDriven())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will codeGenIsTableDriven always hold going forwards for additional intrinsics?

I wonder if it needs to be:

                        switch (intrinEmbMask.id)
                        {
                            case NI_Sve_LoadVectorByteNonFaultingZeroExtendToInt16:
                            case NI_Sve_LoadVectorByteNonFaultingZeroExtendToInt32:
                            ....
                            case NI_Sve_LoadVectorUInt32NonFaultingZeroExtendToInt64:
                            case NI_Sve_LoadVectorUInt32NonFaultingZeroExtendToUInt64:
                              GetEmitter()->emitIns_R_R_R_I ....

                            default:
                                GetEmitter()->emitIns_R_R_R .....
                                break;
                        }

Maybe for now it's fine.

Copy link
Contributor Author

@TIHan TIHan May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will hold for additional intrinsics that do not have the HW_Flag_SpecialCodeGen flag, but have the HW_Flag_EmbeddedMaskedOperation flag.

@@ -95,6 +95,18 @@ HARDWARE_INTRINSIC(Sve, LoadVectorUInt16ZeroExtendToUInt32,
HARDWARE_INTRINSIC(Sve, LoadVectorUInt16ZeroExtendToUInt64, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1h, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorUInt32ZeroExtendToInt64, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorUInt32ZeroExtendToUInt64, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorByteNonFaultingZeroExtendToInt16, -1, 1, false, {INS_invalid, INS_invalid, INS_sve_ldnf1b, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_SpecialCodeGen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something....

HW_Flag_SpecialCodeGen is set for these.
During codegen, genHWIntrinsic() is called with LoadVectorByteNonFaultingZeroExtendToInt16.
intrin.codeGenIsTableDriven() check fails (due to HW_Flag_SpecialCodeGen).
Code falls into the switch (intrin.id) at the end of the function.
Switch hits default: unreached() due to no LoadVectorByteNonFaultingZeroExtendToInt16 case

Copy link
Contributor Author

@TIHan TIHan May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the HW_Flag_EmbeddedMaskedOperation flag, at the point of genHWIntrinsic, intrin will never be Sve_LoadVectorByteNonFaultingZeroExtendToInt16 and instead be Sve_ConditionalSelect that wraps Sve_LoadVectorByteNonFaultingZeroExtendToInt16. This is why I had to handle intrinEmbMask.codeGenIsTableDriven() like you saw.

@TIHan
Copy link
Contributor Author

TIHan commented May 31, 2024

@kunalspathak this is ready again, thanks for updating the test.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TIHan TIHan merged commit 73b6e09 into dotnet:main May 31, 2024
162 of 167 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants