Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

kunalspathak · 2024-06-12T14:48:40Z

Also fixed a minor issue where we should be allocating vector registers instead of predicate. I found this out when testing with the "full predicate registeR" support.

All tests pass: https://gist.github.com/kunalspathak/86d75283326949a6e0871a2fecf64df3

kunalspathak · 2024-06-12T14:48:53Z

@dotnet/arm64-contrib

dotnet-policy-service · 2024-06-12T14:49:08Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/hwintrinsicarm64.cpp

TIHan

LGTM!

My only question is why we need to add these _Mask versions? Is it that the non-_Mask versions need to actually return a vector and not a mask, therefore we need to introduce these _Mask versions?

kunalspathak · 2024-06-12T16:50:10Z

LGTM!

My only question is why we need to add these _Mask versions? Is it that the non-_Mask versions need to actually return a vector and not a mask, therefore we need to introduce these _Mask versions?

That's correct.

tannergooding · 2024-06-12T17:07:23Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            case NI_Sve_CreateTrueMaskByte_Mask:
+            case NI_Sve_CreateTrueMaskDouble_Mask:
+            case NI_Sve_CreateTrueMaskInt16_Mask:
+            case NI_Sve_CreateTrueMaskInt32_Mask:
+            case NI_Sve_CreateTrueMaskInt64_Mask:
+            case NI_Sve_CreateTrueMaskSByte_Mask:
+            case NI_Sve_CreateTrueMaskSingle_Mask:
+            case NI_Sve_CreateTrueMaskUInt16_Mask:
+            case NI_Sve_CreateTrueMaskUInt32_Mask:
+            case NI_Sve_CreateTrueMaskUInt64_Mask:


We shouldn't need these with the PR that went in last night to support rewriting TYP_MASK in rationalization and we should already be getting good codegen without it being rewritten to a call in the case that the imm is an in-bounds constant at import -or- becomes a constant post import.

You can see that the following:

public static Vector<byte> M1() => Sve.CreateTrueMaskByte(SveMaskPattern.LargestMultipleOf4); public static Vector<byte> M2() { var pattern = SveMaskPattern.LargestMultipleOf4; return Sve.CreateTrueMaskByte(pattern); }

Generates:

; Method Program:M1():System.Numerics.Vector`1[ubyte] (FullOpts) G_M31775_IG01: ;; offset=0x0000 stp fp, lr, [sp, #-0x10]! mov fp, sp ;; size=8 bbWeight=1 PerfScore 1.50 G_M31775_IG02: ;; offset=0x0008 ptrue p0.b, mul4 mov z0.b, p0/z, #1 ;; size=8 bbWeight=1 PerfScore 4.00 G_M31775_IG03: ;; offset=0x0010 ldp fp, lr, [sp], #0x10 ret lr ;; size=8 bbWeight=1 PerfScore 2.00 ; Total bytes of code: 24

There does notably look to be an issue still with:

[MethodImpl(MethodImplOptions.NoInlining)] public static Vector<byte> M3(SveMaskPattern pattern) { return Sve.CreateTrueMaskByte(pattern); }

I've put up a PR (#103356) to handle the node insertion order issue, which fixes it to generate:

; Method Program:M3(ubyte):System.Numerics.Vector`1[ubyte] (FullOpts) G_M15394_IG01: ;; offset=0x0000 stp fp, lr, [sp, #-0x10]! mov fp, sp ;; size=8 bbWeight=1 PerfScore 1.50 G_M15394_IG02: ;; offset=0x0008 uxtb w0, w0 movz x1, #0x6E98 // code for System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte] movk x1, #0xE7A8 LSL #16 movk x1, #0x7FFC LSL #32 ldr x1, [x1] blr x1 ptrue p7.b cmpne p0.b, p7/z, z0.b, #0 mov z0.b, p0/z, #1 ;; size=36 bbWeight=1 PerfScore 13.00 G_M15394_IG03: ;; offset=0x002C ldp fp, lr, [sp], #0x10 ret lr ;; size=8 bbWeight=1 PerfScore 2.00 ; Total bytes of code: 52

Which is to say the IR:

N005 ( 6, 7) [000003] --C-------- * RETURN simd16 $VN.Void N004 ( 5, 6) [000002] --C-------- \--* HWINTRINSIC simd16 ubyte ConvertMaskToVector $200 N003 ( 4, 5) [000001] --C-------- \--* HWINTRINSIC mask ubyte CreateTrueMaskByte $140 N002 ( 3, 4) [000004] ----------- \--* CAST int <- ubyte <- int $100 N001 ( 2, 2) [000000] ----------- \--* LCL_VAR int V00 arg0 u:1 (last use) $80

rationalizes into:

[000005] ----------- IL_OFFSET void INLRT @ 0x000[E-] N001 ( 2, 2) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 /--* t0 int N002 ( 3, 4) [000004] ----------- t4 = * CAST int <- ubyte <- int $100 /--* t4 int arg0 in x0 N003 ( 17, 7) [000006] --C-G------ t6 = * CALL simd16 System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte] N004 ( 1, 1) [000007] ----------- t7 = HWINTRINSIC mask ubyte CreateTrueMaskAll /--* t7 mask +--* t6 simd16 N005 ( 19, 9) [000008] --C-G------ t8 = * HWINTRINSIC mask ubyte ConvertVectorToMask /--* t8 mask N004 ( 5, 6) [000002] --C-G------ t2 = * HWINTRINSIC simd16 ubyte ConvertMaskToVector $200 /--* t2 simd16 N005 ( 6, 7) [000003] --C-G------ * RETURN simd16 $VN.Void

-- There is however still some LSRA issue where referant is nullptr:

> clrjit_universal_arm64_x64.dll!RefPosition::getRegisterType() Line 2665 C++ clrjit_universal_arm64_x64.dll!RefPosition::dump(LinearScan * linearScan) Line 10451 C++ clrjit_universal_arm64_x64.dll!LinearScan::newRefPosition(Interval * theInterval, unsigned int theLocation, RefType theRefType, GenTree * theTreeNode, unsigned __int64 mask, unsigned int multiRegIdx) Line 662 C++ clrjit_universal_arm64_x64.dll!LinearScan::addKillForRegs(regMaskTP mask, unsigned int currentLoc) Line 722 C++ clrjit_universal_arm64_x64.dll!LinearScan::buildKillPositionsForNode(GenTree * tree, unsigned int currentLoc, regMaskTP killMask) Line 1141 C++ clrjit_universal_arm64_x64.dll!LinearScan::BuildKills(GenTree * tree, regMaskTP killMask) Line 3257 C++ clrjit_universal_arm64_x64.dll!LinearScan::BuildDefWithKills(GenTree * tree, unsigned __int64 dstCandidates, regMaskTP killMask) Line 3304 C++ clrjit_universal_arm64_x64.dll!LinearScan::BuildCall(GenTreeCall * call) Line 413 C++ clrjit_universal_arm64_x64.dll!LinearScan::BuildNode(GenTree * tree) Line 1076 C++ clrjit_universal_arm64_x64.dll!LinearScan::buildRefPositionsForNode(GenTree * tree, unsigned int currentLoc) Line 1787 C++ clrjit_universal_arm64_x64.dll!LinearScan::buildIntervals<1>() Line 2589 C++ clrjit_universal_arm64_x64.dll!LinearScan::doLinearScan() Line 1494 C++ clrjit_universal_arm64_x64.dll!Compiler::compCompile::__l2::<lambda_5>::operator()() Line 5322 C++ clrjit_universal_arm64_x64.dll!ActionPhase<`Compiler::compCompile'::`2'::<lambda_5>>::DoPhase() Line 70 C++ clrjit_universal_arm64_x64.dll!Phase::Run() Line 61 C++ clrjit_universal_arm64_x64.dll!DoPhase<`Compiler::compCompile'::`2'::<lambda_5>>(Compiler * _compiler, Phases _phase, Compiler::compCompile::__l2::<lambda_5> _action) Line 83 C++ clrjit_universal_arm64_x64.dll!Compiler::compCompile(void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 5327 C++ clrjit_universal_arm64_x64.dll!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 7396 C++ clrjit_universal_arm64_x64.dll!`Compiler::compCompile'::`127'::__Body::Run(Compiler::compCompile::__l2::__JITParam * __JITpParam) Line 6532 C++ clrjit_universal_arm64_x64.dll!Compiler::compCompile(CORINFO_MODULE_STRUCT_ * classPtr, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 6536 C++ clrjit_universal_arm64_x64.dll!``jitNativeCode'::`8'::__Body::Run'::`6'::__Body::Run(jitNativeCode::__l8::__Body::Run::__l5::__JITParam * __JITpParam) Line 8035 C++ clrjit_universal_arm64_x64.dll!`jitNativeCode'::`8'::__Body::Run(jitNativeCode::__l2::__JITParam * __JITpParam) Line 8038 C++ clrjit_universal_arm64_x64.dll!jitNativeCode(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags, void * inlineInfoPtr) Line 8062 C++ clrjit_universal_arm64_x64.dll!CILJit::compileMethod(ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned int * nativeSizeOfCode) Line 291 C++

kunalspathak · 2024-06-12T23:28:35Z

with #103356, we don't need this anymore. I verified the tests pass with the latest main.

kunalspathak added 6 commits June 12, 2024 06:36

Assign float registers if node is mask

284bfb9

Create CreateTrue*_Mask and use it if immediate is constant

9950edb

Make sure to use vector registers if there is no mask

ee6e83a

fix after merge

ac21747

jit format

4b9e7bd

add new test template

b9a07fe

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 12, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 12, 2024

kunalspathak requested a review from tannergooding June 12, 2024 14:48

dotnet-policy-service bot assigned kunalspathak Jun 12, 2024

a74nh reviewed Jun 12, 2024

View reviewed changes

src/coreclr/jit/hwintrinsicarm64.cpp Outdated Show resolved Hide resolved

TIHan approved these changes Jun 12, 2024

View reviewed changes

Add comment

1160ec3

build-analysis bot mentioned this pull request Jun 12, 2024

System.Net.Security.Tests.SslStreamTlsResumeTests failing #103354

Closed

tannergooding reviewed Jun 12, 2024

View reviewed changes

kunalspathak closed this Jun 12, 2024

github-actions bot locked and limited conversation to collaborators Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

kunalspathak commented Jun 12, 2024

kunalspathak commented Jun 12, 2024

dotnet-policy-service bot commented Jun 12, 2024

TIHan left a comment

kunalspathak commented Jun 12, 2024

tannergooding Jun 12, 2024

kunalspathak commented Jun 12, 2024

Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

Conversation

kunalspathak commented Jun 12, 2024

kunalspathak commented Jun 12, 2024

dotnet-policy-service bot commented Jun 12, 2024

TIHan left a comment

Choose a reason for hiding this comment

kunalspathak commented Jun 12, 2024

tannergooding Jun 12, 2024

Choose a reason for hiding this comment

kunalspathak commented Jun 12, 2024