-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64/Sve: Add CreateTrue*_Mask version that works when immediate are constants #103346
Conversation
@dotnet/arm64-contrib |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
My only question is why we need to add these _Mask versions? Is it that the non-_Mask versions need to actually return a vector and not a mask, therefore we need to introduce these _Mask versions?
That's correct. |
case NI_Sve_CreateTrueMaskByte_Mask: | ||
case NI_Sve_CreateTrueMaskDouble_Mask: | ||
case NI_Sve_CreateTrueMaskInt16_Mask: | ||
case NI_Sve_CreateTrueMaskInt32_Mask: | ||
case NI_Sve_CreateTrueMaskInt64_Mask: | ||
case NI_Sve_CreateTrueMaskSByte_Mask: | ||
case NI_Sve_CreateTrueMaskSingle_Mask: | ||
case NI_Sve_CreateTrueMaskUInt16_Mask: | ||
case NI_Sve_CreateTrueMaskUInt32_Mask: | ||
case NI_Sve_CreateTrueMaskUInt64_Mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need these with the PR that went in last night to support rewriting TYP_MASK
in rationalization and we should already be getting good codegen without it being rewritten to a call in the case that the imm
is an in-bounds constant at import -or- becomes a constant post import.
You can see that the following:
public static Vector<byte> M1() => Sve.CreateTrueMaskByte(SveMaskPattern.LargestMultipleOf4);
public static Vector<byte> M2()
{
var pattern = SveMaskPattern.LargestMultipleOf4;
return Sve.CreateTrueMaskByte(pattern);
}
Generates:
; Method Program:M1():System.Numerics.Vector`1[ubyte] (FullOpts)
G_M31775_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M31775_IG02: ;; offset=0x0008
ptrue p0.b, mul4
mov z0.b, p0/z, #1
;; size=8 bbWeight=1 PerfScore 4.00
G_M31775_IG03: ;; offset=0x0010
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 24
There does notably look to be an issue still with:
[MethodImpl(MethodImplOptions.NoInlining)]
public static Vector<byte> M3(SveMaskPattern pattern)
{
return Sve.CreateTrueMaskByte(pattern);
}
I've put up a PR (#103356) to handle the node insertion order issue, which fixes it to generate:
; Method Program:M3(ubyte):System.Numerics.Vector`1[ubyte] (FullOpts)
G_M15394_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M15394_IG02: ;; offset=0x0008
uxtb w0, w0
movz x1, #0x6E98 // code for System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte]
movk x1, #0xE7A8 LSL #16
movk x1, #0x7FFC LSL #32
ldr x1, [x1]
blr x1
ptrue p7.b
cmpne p0.b, p7/z, z0.b, #0
mov z0.b, p0/z, #1
;; size=36 bbWeight=1 PerfScore 13.00
G_M15394_IG03: ;; offset=0x002C
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 52
Which is to say the IR:
N005 ( 6, 7) [000003] --C-------- * RETURN simd16 $VN.Void
N004 ( 5, 6) [000002] --C-------- \--* HWINTRINSIC simd16 ubyte ConvertMaskToVector $200
N003 ( 4, 5) [000001] --C-------- \--* HWINTRINSIC mask ubyte CreateTrueMaskByte $140
N002 ( 3, 4) [000004] ----------- \--* CAST int <- ubyte <- int $100
N001 ( 2, 2) [000000] ----------- \--* LCL_VAR int V00 arg0 u:1 (last use) $80
rationalizes into:
[000005] ----------- IL_OFFSET void INLRT @ 0x000[E-]
N001 ( 2, 2) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80
/--* t0 int
N002 ( 3, 4) [000004] ----------- t4 = * CAST int <- ubyte <- int $100
/--* t4 int arg0 in x0
N003 ( 17, 7) [000006] --C-G------ t6 = * CALL simd16 System.Runtime.Intrinsics.Arm.Sve:CreateTrueMaskByte(ubyte):System.Numerics.Vector`1[ubyte]
N004 ( 1, 1) [000007] ----------- t7 = HWINTRINSIC mask ubyte CreateTrueMaskAll
/--* t7 mask
+--* t6 simd16
N005 ( 19, 9) [000008] --C-G------ t8 = * HWINTRINSIC mask ubyte ConvertVectorToMask
/--* t8 mask
N004 ( 5, 6) [000002] --C-G------ t2 = * HWINTRINSIC simd16 ubyte ConvertMaskToVector $200
/--* t2 simd16
N005 ( 6, 7) [000003] --C-G------ * RETURN simd16 $VN.Void
-- There is however still some LSRA issue where referant
is nullptr
:
> clrjit_universal_arm64_x64.dll!RefPosition::getRegisterType() Line 2665 C++
clrjit_universal_arm64_x64.dll!RefPosition::dump(LinearScan * linearScan) Line 10451 C++
clrjit_universal_arm64_x64.dll!LinearScan::newRefPosition(Interval * theInterval, unsigned int theLocation, RefType theRefType, GenTree * theTreeNode, unsigned __int64 mask, unsigned int multiRegIdx) Line 662 C++
clrjit_universal_arm64_x64.dll!LinearScan::addKillForRegs(regMaskTP mask, unsigned int currentLoc) Line 722 C++
clrjit_universal_arm64_x64.dll!LinearScan::buildKillPositionsForNode(GenTree * tree, unsigned int currentLoc, regMaskTP killMask) Line 1141 C++
clrjit_universal_arm64_x64.dll!LinearScan::BuildKills(GenTree * tree, regMaskTP killMask) Line 3257 C++
clrjit_universal_arm64_x64.dll!LinearScan::BuildDefWithKills(GenTree * tree, unsigned __int64 dstCandidates, regMaskTP killMask) Line 3304 C++
clrjit_universal_arm64_x64.dll!LinearScan::BuildCall(GenTreeCall * call) Line 413 C++
clrjit_universal_arm64_x64.dll!LinearScan::BuildNode(GenTree * tree) Line 1076 C++
clrjit_universal_arm64_x64.dll!LinearScan::buildRefPositionsForNode(GenTree * tree, unsigned int currentLoc) Line 1787 C++
clrjit_universal_arm64_x64.dll!LinearScan::buildIntervals<1>() Line 2589 C++
clrjit_universal_arm64_x64.dll!LinearScan::doLinearScan() Line 1494 C++
clrjit_universal_arm64_x64.dll!Compiler::compCompile::__l2::<lambda_5>::operator()() Line 5322 C++
clrjit_universal_arm64_x64.dll!ActionPhase<`Compiler::compCompile'::`2'::<lambda_5>>::DoPhase() Line 70 C++
clrjit_universal_arm64_x64.dll!Phase::Run() Line 61 C++
clrjit_universal_arm64_x64.dll!DoPhase<`Compiler::compCompile'::`2'::<lambda_5>>(Compiler * _compiler, Phases _phase, Compiler::compCompile::__l2::<lambda_5> _action) Line 83 C++
clrjit_universal_arm64_x64.dll!Compiler::compCompile(void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 5327 C++
clrjit_universal_arm64_x64.dll!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 7396 C++
clrjit_universal_arm64_x64.dll!`Compiler::compCompile'::`127'::__Body::Run(Compiler::compCompile::__l2::__JITParam * __JITpParam) Line 6532 C++
clrjit_universal_arm64_x64.dll!Compiler::compCompile(CORINFO_MODULE_STRUCT_ * classPtr, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags) Line 6536 C++
clrjit_universal_arm64_x64.dll!``jitNativeCode'::`8'::__Body::Run'::`6'::__Body::Run(jitNativeCode::__l8::__Body::Run::__l5::__JITParam * __JITpParam) Line 8035 C++
clrjit_universal_arm64_x64.dll!`jitNativeCode'::`8'::__Body::Run(jitNativeCode::__l2::__JITParam * __JITpParam) Line 8038 C++
clrjit_universal_arm64_x64.dll!jitNativeCode(CORINFO_METHOD_STRUCT_ * methodHnd, CORINFO_MODULE_STRUCT_ * classPtr, ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, void * * methodCodePtr, unsigned int * methodCodeSize, JitFlags * compileFlags, void * inlineInfoPtr) Line 8062 C++
clrjit_universal_arm64_x64.dll!CILJit::compileMethod(ICorJitInfo * compHnd, CORINFO_METHOD_INFO * methodInfo, unsigned int flags, unsigned char * * entryAddress, unsigned int * nativeSizeOfCode) Line 291 C++
with #103356, we don't need this anymore. I verified the tests pass with the latest main. |
Also fixed a minor issue where we should be allocating vector registers instead of predicate. I found this out when testing with the "full predicate registeR" support.
All tests pass: https://gist.github.com/kunalspathak/86d75283326949a6e0871a2fecf64df3