Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346

Closed
wants to merge 7 commits into from

Conversation

kunalspathak
Copy link
Member

Also fixed a minor issue where we should be allocating vector registers instead of predicate. I found this out when testing with the "full predicate registeR" support.

All tests pass: https://gist.github.com/kunalspathak/86d75283326949a6e0871a2fecf64df3

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 12, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 12, 2024
@kunalspathak
Copy link
Member Author

@dotnet/arm64-contrib

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

My only question is why we need to add these _Mask versions? Is it that the non-_Mask versions need to actually return a vector and not a mask, therefore we need to introduce these _Mask versions?

@kunalspathak
Copy link
Member Author

LGTM!

My only question is why we need to add these _Mask versions? Is it that the non-_Mask versions need to actually return a vector and not a mask, therefore we need to introduce these _Mask versions?

That's correct.

Comment on lines +416 to +425
case NI_Sve_CreateTrueMaskByte_Mask:
case NI_Sve_CreateTrueMaskDouble_Mask:
case NI_Sve_CreateTrueMaskInt16_Mask:
case NI_Sve_CreateTrueMaskInt32_Mask:
case NI_Sve_CreateTrueMaskInt64_Mask:
case NI_Sve_CreateTrueMaskSByte_Mask:
case NI_Sve_CreateTrueMaskSingle_Mask:
case NI_Sve_CreateTrueMaskUInt16_Mask:
case NI_Sve_CreateTrueMaskUInt32_Mask:
case NI_Sve_CreateTrueMaskUInt64_Mask:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need these with the PR that went in last night to support rewriting TYP_MASK in rationalization and we should already be getting good codegen without it being rewritten to a call in the case that the imm is an in-bounds constant at import -or- becomes a constant post import.

You can see that the following:

public static Vector<byte> M1() => Sve.CreateTrueMaskByte(SveMaskPattern.LargestMultipleOf4);

public static Vector<byte> M2()
{
    var pattern = SveMaskPattern.LargestMultipleOf4;
    return Sve.CreateTrueMaskByte(pattern);
}

Generates:

; Method Program:M1():System.Numerics.Vector`1[ubyte] (FullOpts)
G_M31775_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
						;; size=8 bbWeight=1 PerfScore 1.50

G_M31775_IG02:  ;; offset=0x0008
            ptrue   p0.b, mul4
            mov     z0.b, p0/z, #1
						;; size=8 bbWeight=1 PerfScore 4.00

G_M31775_IG03:  ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 24

There does notably look to be an issue still with:

[MethodImpl(MethodImplOptions.NoInlining)]
public static Vector<byte> M3(SveMaskPattern pattern)
{
    return Sve.CreateTrueMaskByte(pattern);
}

I've put up a PR (#103356) to handle the node insertion order issue, which fixes it to generate:

; Method Program:M3(ubyte):System.Numerics.Vector`1[ubyte] (FullOpts)
G_M15394_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
						;; size=8 bbWeight=1 PerfScore 1.50

G_M15394_IG02:  ;; offset=0x0008
            uxtb    w0, w0
            movz    x1, #0x6E98      // code for System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte]
            movk    x1, #0xE7A8 LSL #16
            movk    x1, #0x7FFC LSL #32
            ldr     x1, [x1]
            blr     x1
            ptrue   p7.b
            cmpne   p0.b, p7/z, z0.b, #0
            mov     z0.b, p0/z, #1
						;; size=36 bbWeight=1 PerfScore 13.00

G_M15394_IG03:  ;; offset=0x002C
            ldp     fp, lr, [sp], #0x10
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 52

Which is to say the IR:

N005 (  6,  7) [000003] --C--------                         *  RETURN    simd16 $VN.Void
N004 (  5,  6) [000002] --C--------                         \--*  HWINTRINSIC simd16 ubyte ConvertMaskToVector $200
N003 (  4,  5) [000001] --C--------                            \--*  HWINTRINSIC mask   ubyte CreateTrueMaskByte $140
N002 (  3,  4) [000004] -----------                               \--*  CAST      int <- ubyte <- int $100
N001 (  2,  2) [000000] -----------                                  \--*  LCL_VAR   int    V00 arg0         u:1 (last use) $80

rationalizes into:

               [000005] -----------                            IL_OFFSET void   INLRT @ 0x000[E-]
N001 (  2,  2) [000000] -----------                    t0 =    LCL_VAR   int    V00 arg0         u:1 (last use) $80
                                                            /--*  t0     int    
N002 (  3,  4) [000004] -----------                    t4 = *  CAST      int <- ubyte <- int $100
                                                            /--*  t4     int    arg0 in x0
N003 ( 17,  7) [000006] --C-G------                    t6 = *  CALL      simd16 System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte]
N004 (  1,  1) [000007] -----------                    t7 =    HWINTRINSIC mask   ubyte CreateTrueMaskAll
                                                            /--*  t7     mask   
                                                            +--*  t6     simd16 
N005 ( 19,  9) [000008] --C-G------                    t8 = *  HWINTRINSIC mask   ubyte ConvertVectorToMask
                                                            /--*  t8     mask   
N004 (  5,  6) [000002] --C-G------                    t2 = *  HWINTRINSIC simd16 ubyte ConvertMaskToVector $200
                                                            /--*  t2     simd16 
N005 (  6,  7) [000003] --C-G------                         *  RETURN    simd16 $VN.Void

-- There is however still some LSRA issue where referant is nullptr:

>	clrjit_universal_arm64_x64.dll!RefPosition::getRegisterType() Line 2665	C++
 	clrjit_universal_arm64_x64.dll!RefPosition::dump(LinearScan * linearScan) Line 10451	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::newRefPosition(Interval * theInterval, unsigned int theLocation, RefType theRefType, GenTree * theTreeNode, unsigned __int64 mask, unsigned int multiRegIdx) Line 662	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::addKillForRegs(regMaskTP mask, unsigned int currentLoc) Line 722	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::buildKillPositionsForNode(GenTree * tree, unsigned int currentLoc, regMaskTP killMask) Line 1141	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::BuildKills(GenTree * tree, regMaskTP killMask) Line 3257	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::BuildDefWithKills(GenTree * tree, unsigned __int64 dstCandidates, regMaskTP killMask) Line 3304	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::BuildCall(GenTreeCall * call) Line 413	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::BuildNode(GenTree * tree) Line 1076	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::buildRefPositionsForNode(GenTree * tree, unsigned int currentLoc) Line 1787	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::buildIntervals<1>() Line 2589	C++
 	clrjit_universal_arm64_x64.dll!LinearScan::doLinearScan() Line 1494	C++
 	clrjit_universal_arm64_x64.dll!Compiler::compCompile::__l2::<lambda_5>::operator()() Line 5322	C++
 	clrjit_universal_arm64_x64.dll!ActionPhase<`Compiler::compCompile'::`2'::<lambda_5>>::DoPhase() Line 70	C++
 	clrjit_universal_arm64_x64.dll!Phase::Run() Line 61	C++
 	clrjit_universal_arm64_x64.dll!DoPhase<`Compiler::compCompile'::`2'::<lambda_5>>(Compiler * _compiler, Phases _phase, Compiler::compCompile::__l2::<lambda_5> _action) Line 83	C++
 	clrjit_universal_arm64_x64.dll!Compiler::compCompile(void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 5327	C++
 	clrjit_universal_arm64_x64.dll!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 7396	C++
 	clrjit_universal_arm64_x64.dll!`Compiler::compCompile'::`127'::__Body::Run(Compiler::compCompile::__l2::__JITParam * __JITpParam) Line 6532	C++
 	clrjit_universal_arm64_x64.dll!Compiler::compCompile(CORINFO_MODULE_STRUCT_ * classPtr, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 6536	C++
 	clrjit_universal_arm64_x64.dll!``jitNativeCode'::`8'::__Body::Run'::`6'::__Body::Run(jitNativeCode::__l8::__Body::Run::__l5::__JITParam * __JITpParam) Line 8035	C++
 	clrjit_universal_arm64_x64.dll!`jitNativeCode'::`8'::__Body::Run(jitNativeCode::__l2::__JITParam * __JITpParam) Line 8038	C++
 	clrjit_universal_arm64_x64.dll!jitNativeCode(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags, void * inlineInfoPtr) Line 8062	C++
 	clrjit_universal_arm64_x64.dll!CILJit::compileMethod(ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned int * nativeSizeOfCode) Line 291	C++

@kunalspathak
Copy link
Member Author

with #103356, we don't need this anymore. I verified the tests pass with the latest main.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants