Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Ensure MOVPRFX is next to SVE instruction in imm tables #106125

Merged
merged 7 commits into from
Aug 13, 2024

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Aug 8, 2024

As per the Arm Arm, MOVPRFX instructions must be followed by an SVE instruction of specific type,
otherwise behaviour is undefined.

For immediate lookup tables MOVPRFX instructions were being placed before the table instead of
next to the SVE instruction. Instead, the movprfx needs moving inside each case next to the
relevant SVE instruction.

System.Runtime.Intrinsics.Arm.Sve:ShiftRightArithmeticForDivide before:

G_M51003_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x30]!
            mov     fp, sp
            str     q0, [fp, #0x20]	// [V00 arg0]
            str     w0, [fp, #0x1C]	// [V01 arg1]
						;; size=16 bbWeight=1 PerfScore 3.50
G_M51003_IG02:  ;; offset=0x0010
            ldr     w0, [fp, #0x1C]	// [V01 arg1]
            uxtb    w0, w0
            sub     w0, w0, #1
            cmp     w0, #32
            bhs     G_M51003_IG37
            ldr     w0, [fp, #0x1C]	// [V01 arg1]
            uxtb    w0, w0
            ldr     q16, [fp, #0x20]	// [V00 arg0]
            ptrue   p0.s
            movprfx z16.s, p0/z, z16.s
            adr     x1, [G_M51003_IG03]
            add     x1, x1, x0,  LSL #3
            sub     x1, x1, #8
            br      x1
						;; size=56 bbWeight=1 PerfScore 16.00
G_M51003_IG03:  ;; offset=0x0048
            asrd    z16.s, p0/m, z16.s, #1
            b       G_M51003_IG35
						;; size=8 bbWeight=1 PerfScore 4.00
G_M51003_IG04:  ;; offset=0x0050
            asrd    z16.s, p0/m, z16.s, #2
            b       G_M51003_IG35
						;; size=8 bbWeight=1 PerfScore 4.00
G_M51003_IG05:  ;; offset=0x0058
            asrd    z16.s, p0/m, z16.s, #3
            b       G_M51003_IG35
						;; size=8 bbWeight=1 PerfScore 4.00
....etc...
G_M51003_IG33:  ;; offset=0x0138
            asrd    z16.s, p0/m, z16.s, #31
            b       G_M51003_IG35
						;; size=8 bbWeight=1 PerfScore 4.00
G_M51003_IG34:  ;; offset=0x0140
            asrd    z16.s, p0/m, z16.s, #32
						;; size=4 bbWeight=1 PerfScore 3.00
G_M51003_IG35:  ;; offset=0x0144
            mov     v0.16b, v16.16b
						;; size=4 bbWeight=1 PerfScore 0.50
G_M51003_IG36:  ;; offset=0x0148
            ldp     fp, lr, [sp], #0x30
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00
G_M51003_IG37:  ;; offset=0x0150
            bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
            brk_unix #0
						;; size=8 bbWeight=0 PerfScore 0.00

System.Runtime.Intrinsics.Arm.Sve:ShiftRightArithmeticForDivide after:

G_M51003_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x30]!
            mov     fp, sp
            str     q0, [fp, #0x20]	// [V00 arg0]
            str     w0, [fp, #0x1C]	// [V01 arg1]
						;; size=16 bbWeight=1 PerfScore 3.50
G_M51003_IG02:  ;; offset=0x0010
            ldr     w0, [fp, #0x1C]	// [V01 arg1]
            uxtb    w0, w0
            sub     w0, w0, #1
            cmp     w0, #32
            bhs     G_M51003_IG37
            ldr     w0, [fp, #0x1C]	// [V01 arg1]
            uxtb    w0, w0
            ldr     q16, [fp, #0x20]	// [V00 arg0]
            ptrue   p0.s
            adr     x1, [G_M51003_IG03]
            add     x1, x1, x0,  LSL #3
            add     x1, x1, x0,  LSL #2
            sub     x1, x1, #12
            br      x1
						;; size=56 bbWeight=1 PerfScore 15.00
G_M51003_IG03:  ;; offset=0x0048
            movprfx z16.s, p0/z, z16.s
            asrd    z16.s, p0/m, z16.s, #1
            b       G_M51003_IG35
						;; size=12 bbWeight=1 PerfScore 6.00
G_M51003_IG04:  ;; offset=0x0054
            movprfx z16.s, p0/z, z16.s
            asrd    z16.s, p0/m, z16.s, #2
            b       G_M51003_IG35
						;; size=12 bbWeight=1 PerfScore 6.00
G_M51003_IG05:  ;; offset=0x0060
            movprfx z16.s, p0/z, z16.s
            asrd    z16.s, p0/m, z16.s, #3
            b       G_M51003_IG35
						;; size=12 bbWeight=1 PerfScore 6.00
...etc...
G_M51003_IG33:  ;; offset=0x01B0
            movprfx z16.s, p0/z, z16.s
            asrd    z16.s, p0/m, z16.s, #31
            b       G_M51003_IG35
						;; size=12 bbWeight=1 PerfScore 6.00
G_M51003_IG34:  ;; offset=0x01BC
            movprfx z16.s, p0/z, z16.s
            asrd    z16.s, p0/m, z16.s, #32
						;; size=8 bbWeight=1 PerfScore 5.00
G_M51003_IG35:  ;; offset=0x01C4
            mov     v0.16b, v16.16b
						;; size=4 bbWeight=1 PerfScore 0.50
G_M51003_IG36:  ;; offset=0x01C8
            ldp     fp, lr, [sp], #0x30
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00
G_M51003_IG37:  ;; offset=0x01D0
            bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
            brk_unix #0
						;; size=8 bbWeight=0 PerfScore 0.00

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 8, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 8, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@a74nh a74nh marked this pull request as ready for review August 8, 2024 13:15
@a74nh
Copy link
Contributor Author

a74nh commented Aug 8, 2024

@dotnet/arm64-contrib @kunalspathak

@@ -736,12 +783,10 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)

default:
assert(targetReg != embMaskOp2Reg);
GetEmitter()->emitIns_R_R_R(INS_sve_movprfx, emitSize, targetReg, maskReg,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was emitted the predicated movprfx but now we will emit unpredicated one.

Copy link
Member

@kunalspathak kunalspathak Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately we do not have superpmi-diff for SVE tests, but should be easy to do that to check the disasm diffs.

  1. python src\coreclr\scripts\superpmi.py collect corerun hardwareintrinsic.dll
  2. superpmi.py asmdiff abc.mch

Copy link
Contributor Author

@a74nh a74nh Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was emitted the predicated movprfx but now we will emit unpredicated one.

Fixed

should be easy to do that to check the disasm diffs.

I'll do this. Sadly there will be lots of diffs because the movprfx has moved, but I'll take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be only in the immediate taking instructions. the goal would be to make sure we did not regress anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran all the Sve tests. Only diffs I can see are the placing of movprfx and additional table lookup calculations.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 9, 2024
}
else if (targetReg != embMaskOp1Reg)
{
// embMaskOp1Reg is same as `falseReg`, but not same as `targetReg`. Move the
// `embMaskOp1Reg` i.e. `falseReg` in `targetReg`, using "unpredicated movprfx", so the
// subsequent `insEmbMask` operation can be merged on top of it.
GetEmitter()->emitIns_R_R(INS_sve_movprfx, EA_SCALABLE, targetReg, falseReg);
emitInsMovPrfxHelper(targetReg, maskReg, falseReg, embMaskOp2Reg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. this was earlier generating unpredicated version, but now is generated predicated one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitInsMovPrfxHelper() generates an unpredicated movprfx:
GetEmitter()->emitIns_R_R(INS_sve_movprfx, EA_SCALABLE, reg1, reg3);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However.... looks like HEAD is wrong:

                            else
                            {
                                // If the instruction just has "predicated" version, then move the "embMaskOp1Reg"
                                // into targetReg. Next, do the predicated operation on the targetReg and last,
                                // use "sel" to select the active lanes based on mask, and set inactive lanes
                                // to falseReg.

                                assert(targetReg != embMaskOp2Reg);
                                assert(HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinEmbMask.id));

                                GetEmitter()->emitIns_R_R(INS_sve_movprfx, EA_SCALABLE, targetReg, embMaskOp1Reg);

                                emitInsHelper(targetReg, maskReg, embMaskOp2Reg);
                            }

That is generating a unpredicated movprfx followed by a predicated instruction.

(This was spotted by my newest changes in #106184)

a74nh added 2 commits August 12, 2024 12:09
Change-Id: I26358b9d0e19fff508d29ce0cd2a70a9d0539b88
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

looks like sve leg is failing to build the tests:

2024-08-12T16:34:49.2846118Z ##[error]MSBUILD(0,0): error MSB4166: /__w/1/s/artifacts/log/MsbuildDebugLogs/MSBuild_pid-18679_effbbc2e614e4810a88acf54a7b70d29.failure.txt:
2024-08-12T16:34:49.2847055Z MSBUILD : error MSB4166: UNHANDLED EXCEPTIONS FROM PROCESS 18679: [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2849009Z ##[error]MSBUILD(0,0): error MSB4166: UNHANDLED EXCEPTIONS FROM PROCESS 18679:
2024-08-12T16:34:49.2849643Z MSBUILD : error MSB4166: ===================== [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2851381Z ##[error]MSBUILD(0,0): error MSB4166: =====================
2024-08-12T16:34:49.2852240Z MSBUILD : error MSB4166: 08/12/2024 16:34:48 [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2854127Z ##[error]MSBUILD(0,0): error MSB4166: 08/12/2024 16:34:48
2024-08-12T16:34:49.2854832Z MSBUILD : error MSB4166: Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: must be valid [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2856902Z ##[error]MSBUILD(0,0): error MSB4166: Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: must be valid
2024-08-12T16:34:49.2857753Z MSBUILD : error MSB4166: at Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(String message, Exception innerException, Object[] args) [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2859913Z ##[error]MSBUILD(0,0): error MSB4166: at Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(String message, Exception innerException, Object[] args)
2024-08-12T16:34:49.2860769Z MSBUILD : error MSB4166: at Microsoft.Build.BackEnd.Logging.LoggingContext.LogBuildEvent(BuildEventArgs buildEvent) [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2862878Z ##[error]MSBUILD(0,0): error MSB4166: at Microsoft.Build.BackEnd.Logging.LoggingContext.LogBuildEvent(BuildEventArgs buildEvent)
2024-08-12T16:34:49.2863741Z MSBUILD : error MSB4166: at Microsoft.Build.BackEnd.Components.RequestBuilder.AssemblyLoadsTracker.CurrentDomainOnAssemblyLoad(Object sender, AssemblyLoadEventArgs args) [/__w/1/s/src/tests/build.proj]
2024-08-12T16:34:49.2865830Z ##[error]MSBUILD(0,0): 

Change-Id: I159a0b72cc53a6b2f0b75418036a8ad5a1a9146d
@a74nh
Copy link
Contributor Author

a74nh commented Aug 13, 2024

looks like sve leg is failing to build the tests:

I was seeing this on another PR with the same HEAD. Merged to latest and it appears to have vanished.

@jakobbotsch
Copy link
Member

That failure is dotnet/dnceng#3304.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amanasifkhalid amanasifkhalid merged commit 0e74147 into dotnet:main Aug 13, 2024
109 of 111 checks passed
@a74nh a74nh deleted the immjump_github branch August 13, 2024 14:35
@github-actions github-actions bot locked and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants