Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

tannergooding · 2022-12-15T20:49:46Z

This moves the remaining legacy SIMD intrinsics (GT_SIMD) that exist in HIR to be implemented using SimdAsHWIntrinsic instead.

As part of this, it means we are able to delete impSIMDIntrinsic.

There are still two legacy SIMD intrinsics which exist in LIR. In particular these are SIMDIntrinsicUpperSave and SIMDIntrinsicUpperRestore. After this PR goes in, I plan on putting up one last PR to move these to be NamedIntrinsic (they cannot be SimdAsHWIntrinsic because they exist even when FEATURE_HW_INTRINSIC is not defined in order to support the ABI). This will allow us to delete GT_SIMD and any remaining support that was specific to the legacy SIMD logic.

…rnal as they are dead

…are contiguous in memory

…e handled in managed code exclusively

…nd remove impSIMDIntrinsic

ghost · 2022-12-15T20:49:56Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This moves the remaining legacy SIMD intrinsics (GT_SIMD) that exist in HIR to be implemented using SimdAsHWIntrinsic instead.

As part of this, it means we are able to delete impSIMDIntrinsic.

There are still two legacy SIMD intrinsics which exist in LIR. In particular these are SIMDIntrinsicUpperSave and SIMDIntrinsicUpperRestore. After this PR goes in, I plan on putting up one last PR to move these to be NamedIntrinsic (they cannot be SimdAsHWIntrinsic because they exist even when FEATURE_HW_INTRINSIC is not defined in order to support the ABI). This will allow us to delete GT_SIMD and any remaining support that was specific to the legacy SIMD logic.

Author:	tannergooding
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2022-12-15T20:57:57Z

Worth noting this is not expected to be zero diff, just positive diffs.

In particular, SIMDIntrinsicInitN had some pretty horrid codegen involving repeated shifts + shuffles. Where-as Vector128_Create will appropriate use SSE41_Insert when it's available.

Some simple diffs from these changes are below.

We'll correctly contain this when no side effects interfere:

- movzx    rax, word  ptr [rsp+04H]
- vmovd    xmm0, eax
- vpbroadcastw ymm0, ymm0
+ vpbroadcastw ymm0, word  ptr [rsp+04H]

We'll emit insertps directly rather than movss + pslldq combinations:

- vxorps   xmm3, xmm3
- vmovss   xmm3, xmm3, xmm2
- vpslldq  xmm3, 4
- vmovss   xmm3, xmm3, xmm6
- vpslldq  xmm3, 4
- vmovss   xmm3, xmm3, xmm5
- vmovaps  xmm2, xmm3
+ vinsertps xmm5, xmm5, xmm6, 16
+ vinsertps xmm2, xmm5, xmm2, 40

We'll recognize and contain zero when its an input to insertps

- vxorps    xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, xmm0, 56

We'll also fold chains of insertps, when there are no side effects, and zero is involved

  vmovss    xmm0, dword ptr [rsi+30H]
  vinsertps xmm0, xmm0, dword ptr [rsi+34H], 16
- vinsertps xmm0, xmm0, dword ptr [rsi+38H], 32
- vxorps    xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, dword ptr [rsi+38H], 40

…izeof(Vector<T>)

… unsupported types

tannergooding · 2022-12-16T17:10:41Z

Raised the aliasing bug on #72725 and asked the fix be reopened.

…gs handling

tannergooding · 2022-12-17T00:38:26Z

CC. @dotnet/jit-contrib. This removes the last bit of the legacy SIMD handling that existed in HIR. It deletes a good amount of dead code and sets us up to move the last two remaining SIMDIntrinsic cases (LIR only) and delete GT_SIMD in a follow up PR.

It provides a small throughput win and some decent size wins for x64, Arm64, and x86.

I'm not quite ure why SPMI is complaining. I noticed that locally jitdiff.exe --diff --pmi has been failing for quite some time on S.P.Corelib due to a bit in impSIMDIntrinsic that wasn't handling Vector<Vector<float>> correctly and that isn't happening anymore now that impSIMDIntrinsic is gone. Perhaps that is related? -- I don't see anything obvious from the log, just a lot of missing context errors. So if someone more knowledgeable here can confirm or point me in the direction of the issue that would be appreciated.

BruceForstall · 2022-12-17T01:22:05Z

I'm not quite ure why SPMI is complaining.

Looks like you have a crash in the Linux/x64 cross-compiler replays, both in the superpmi-diffs and superpmi-replay pipelines. E.g., in the replay pipeline log, cases of:

ERROR: Exception thrown: DebugBreak or AV Exception 123

in the diffs log, e.g.:

ERROR: method 4466 is missing a compileResult, cannot do diffing

To repro, run something like this under the debugger (fix the paths first):

superpmi.exe -box -boe -boa -jitoption JitStressRegs=2 <core_root>\clrjit_unix_x64_x64.dll <spmi_root>\mch\fd13d4e1-9815-4336-8232-b27878b9ebd4.Linux.x64\libraries.pmi.Linux.x64.checked.mch

We do expect some MISSING cases as the JIT evolves. You might be introducing more if you're changing calls across the JIT/EE interface. In fact, the summary page shows that is the case.

tannergooding · 2022-12-17T02:04:46Z

Thanks for the info @BruceForstall.

I did run jitdiff --diff --pmi again locally to double check as well and found two more cases (pre-existing) where we were asserting, so I fixed those (they were small fixes).

I'll try to see if I can repro the SPMI Linux/x64 failure as well.

tannergooding · 2022-12-17T02:35:39Z

hmm, I ran:

D:\Users\tagoo\source\repos\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\superpmi.exe -box -boe -boa -jitoption JitStressRegs=2 D:\Users\tagoo\source\repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit_unix_x64_x64.dll D:\Users\tagoo\source\repos\runtime\artifacts\spmi\mch\fd13d4e1-9815-4336-8232-b27878b9ebd4.Linux.x64\libraries.pmi.Linux.x64.checked.mch

And am seeing the following counts of giving replay failures:

1354 - failed to replay: SuperPMI assertion failed (missing key "key" in map IsValueClass)
98   - failed to replay: SuperPMI assertion failed (missing key "key" in map GetFieldClass)
3    - failed to replay: SuperPMI assertion failed (missing map GetArgType)
4    - failed to replay: SuperPMI assertion failed (missing key "key" in map GetArgType)
25   - failed to replay: SuperPMI assertion failed (missing map GetFieldClass)
11   - failed to replay: SuperPMI assertion failed (missing key "key" in map GetClassSize)
1    - failed to replay: SuperPMI assertion failed (missing key "key" in map GetClassNumInstanceFields)

Nothing outside the missing context ones though. Perhaps I fixed it by resolving the other two issues I had seen...

tannergooding · 2022-12-17T07:01:53Z

Was able to get a local repro, it was libraries_tests.pmi. The log calls out this particular failure as [04:00:15] ERROR: Method 252910 of size 64 failed to load and compile correctly by JIT2 (C:\h\w\B49E0952\p\diff\checked\clrjit_unix_x64_x64.dll).

The actual assert was:

ISSUE: <ASSERT> #252910 D:\Users\tagoo\source\repos\runtime\src\coreclr\jit\lower.cpp (6592) - Assertion failed 'arg->OperIsPutArg()' in 'System.Numerics.Tests.Vector2Tests:Vector2LengthTest1():this' during 'Lowering nodeinfo' (IL size 64; hash 0x1063802a; FullOpts)

ERROR: SuperPMI: Assert Failure (PID 37748, Thread 13216/33a0)
Assertion failed 'arg->OperIsPutArg()' in 'System.Numerics.Tests.Vector2Tests:Vector2LengthTest1():this' during 'Lowering nodeinfo' (IL size 64; hash 0x1063802a; FullOpts)

Issue ended up being the lowering logic around insert zero, zero which removed node but didn't adjust the gtNext returned, so it terminated the lowering logic too early. Fixed that to correctly pass through gtNext and things were passing again.

tannergooding · 2022-12-17T14:41:59Z

CC. @dotnet/jit-contrib ready for review now with the SPMI issue resolved.

Saving ~5.2k overall bytes on Arm64 and ~11.5k overall bytes on x64, with most of that being in MinOpts for each of them.

Likewise with a -0.02% overall improvement to throughput, its higher and up to -0.05% in MinOpts.

tannergooding · 2022-12-20T18:24:10Z

src/coreclr/jit/codegen.h

-    void genSIMDScalarMove(
-        var_types targetType, var_types type, regNumber target, regNumber src, SIMDScalarMoveType moveType);
-    void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg);
-    void genSIMDIntrinsicInitN(GenTreeSIMD* simdNode);
    void genSIMDIntrinsicUpperSave(GenTreeSIMD* simdNode);
    void genSIMDIntrinsicUpperRestore(GenTreeSIMD* simdNode);


These two only exist in LIR and are created by LSRA

tannergooding · 2022-12-20T18:24:36Z

src/coreclr/jit/codegen.h

-                               regNumber       targetReg);
-    void genSIMDIntrinsic32BitConvert(GenTreeSIMD* simdNode);
-    void genSIMDIntrinsic64BitConvert(GenTreeSIMD* simdNode);
-    void genSIMDExtractUpperHalf(GenTreeSIMD* simdNode, regNumber srcReg, regNumber tgtReg);
    void genSIMDIntrinsic(GenTreeSIMD* simdNode);


This can go away once we handle UpperSave/UpperRestore in a follow up PR.

tannergooding · 2022-12-20T18:27:00Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            // TODO-CQ: We don't handle contiguous args for anything except TYP_FLOAT today
+
+            GenTree* prevArg           = nullptr;
+            bool     areArgsContiguous = (simdBaseType == TYP_FLOAT);


The existing handling in areArgumentsContiguous and elsewhere only checks and supports TYP_FLOAT today.

It shouldn't be difficult to extend to other regular types (TYP_INT, TYP_UINT, TYP_LONG, TYP_ULONG, and TYP_DOUBLE). It might require a bit more work to extend to small types (TYP_BYTE, TYP_UBYTE, TYP_SHORT, and TYP_USHORT).

However, I'm leaving it to a follow up PR to minimize the churn here and since it doesn't impact the System.Numerics types that were handled by impSimdIntrinsic.

tannergooding · 2022-12-20T18:28:39Z

src/coreclr/jit/hwintrinsicarm64.cpp

+                CORINFO_ARG_LIST_HANDLE arg1     = sig->args;
+                CORINFO_ARG_LIST_HANDLE arg2     = info.compCompHnd->getArgNext(arg1);
+                var_types               argType  = TYP_UNKNOWN;
+                CORINFO_CLASS_HANDLE    argClass = NO_CLASS_HANDLE;
+
+                argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
+                op2     = getArgForHWIntrinsic(argType, argClass);
+
+                argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg1, &argClass)));
+                op1     = getArgForHWIntrinsic(argType, argClass);


This fixes a pre-existing assert that was being triggered by operator /(Vector128<T> vector, T scalar)

tannergooding · 2022-12-20T18:30:40Z

src/coreclr/jit/lowerxarch.cpp

+            }
+            assert(intrinsicId == NI_SSE41_Insert);
+
+            // We have Sse41.Insert in which case we can specially handle


The handling added here ensures we get great codegen for some of the patterns that had popped up for Vector2/3/4.

It will also help with the System.Runtime.Intrinsics handling when zero is involved.

tannergooding · 2022-12-21T20:16:41Z

Merging in dotnet/main to resolve the unrelated CI failure.

This is still ready for review.

BruceForstall

LGTM.

Run outerloop? jitstress? ISAs jitstress?

tannergooding · 2022-12-28T00:58:52Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2022-12-28T00:59:17Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2022-12-28T06:11:39Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2022-12-28T06:12:03Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2022-12-28T17:06:36Z

jitstress failures are #75244

EgorBo · 2023-01-03T17:44:11Z

Improvements on windows-x64:

[Perf] Windows/x64: 28 Improvements on 12/28/2022 6:20:56 PM perf-autofiling-issues#11314
[Perf] Windows/x64: 11 Improvements on 12/28/2022 6:20:56 PM perf-autofiling-issues#11284

Linux-x64:

[Perf] Linux/x64: 5 Improvements on 12/28/2022 6:20:56 PM perf-autofiling-issues#11278

windows-arm64:

[Perf] Windows/arm64: 1 Improvement on 12/28/2022 5:44:52 PM perf-autofiling-issues#11342
[Perf] Windows/arm64: 2 Improvements on 12/28/2022 5:44:52 PM perf-autofiling-issues#11352

tannergooding added 10 commits December 14, 2022 20:48

Updating SimdAsHWIntrinsic to handle the InitN methods

e89c0ab

Remove SIMDIntrinsicInitN as it is dead

3826f05

Remove SIMDIntrinsicShiftLeftInternal and SIMDIntrinsicShiftRightInte…

664b347

…rnal as they are dead

Remove some other dead functions from the legacy SIMD support

3302452

Improve the codegen for float Sse41.Insert when zero is involved

09b4b3c

Preserve the handling around InitN for Vector2/3/4 and operands that …

471d67e

…are contiguous in memory

Extend the contiguous argument handling to Vector64/128/256

40e2499

Fixing how this is spilled for the SimdAsHWIntrinsic constructors

257a7a7

Remove SIMDIntrinsicInitArray* and SIMDIntrinsicCopyToArray*, they ar…

064832c

…e handled in managed code exclusively

Move SIMDIntrinsicInitFixed to be implemented via SimdAsHWIntrinsic a…

e689ad2

…nd remove impSIMDIntrinsic

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 15, 2022

ghost assigned tannergooding Dec 15, 2022

tannergooding added 4 commits December 15, 2022 13:10

Apply formatting patch

1c68011

Ensure the Unsafe.Add occurs in the right position

e27fb6d

Ensure the Vector<T> APIs that take Span<byte> and ROSpan<byte> use s…

e9fe0e1

…izeof(Vector<T>)

Ensure the Vector<T> APIs that take Span<byte>/ROSpan<byte> check for…

2299bd7

… unsupported types

This was referenced Dec 16, 2022

Precondition failure: File has not had execution verified #79439

Closed

[wasm] Library tests failing during linking for AOT - SIGKILL #79569

Closed

tannergooding force-pushed the main branch from 9407ebf to b5c0c6c Compare December 16, 2022 17:10

tannergooding force-pushed the main branch from b5c0c6c to 2923ff9 Compare December 16, 2022 17:23

Wokaround an aliasing bug in GetArrayDataReference

14203ab

tannergooding force-pushed the main branch from 2923ff9 to 14203ab Compare December 16, 2022 17:46

Ensure the right size/type is used for Vector###_Create contiguous ar…

ca171fc

…gs handling

build-analysis bot mentioned this pull request Dec 16, 2022

NuGet failing with Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11723

Open

5 tasks

tannergooding marked this pull request as ready for review December 17, 2022 00:34

tannergooding added 2 commits December 16, 2022 17:31

Merge remote-tracking branch 'dotnet/main'

095666f

Ensure that jitdiff --diff --pmi doesn't assert

24e9525

Applying formatting patch

28e2366

Ensure we don't return nullptr for a lowered node

38149d5

tannergooding commented Dec 20, 2022

View reviewed changes

Merge remote-tracking branch 'dotnet/main'

4af700e

build-analysis bot mentioned this pull request Dec 22, 2022

emcc received SIGKILL #79874

Closed

BruceForstall approved these changes Dec 28, 2022

View reviewed changes

Ensure TYP_SIMD8 bitcast is handled in VN

1332a01

build-analysis bot mentioned this pull request Dec 28, 2022

Test failure Loader\\classloader\\DictionaryExpansion\\DictionaryExpansion\\DictionaryExpansion.cmd #75244

Closed

tannergooding merged commit 7aba441 into dotnet:main Dec 28, 2022

This was referenced Jan 3, 2023

Regressions in Burgers.Test3 #80129

Closed

[Perf] Windows/x64: 10 Regressions on 12/28/2022 6:20:56 PM dotnet/perf-autofiling-issues#11273

Closed

[Perf] Alpine/x64: 3 Regressions on 12/28/2022 6:20:56 PM dotnet/perf-autofiling-issues#11259

Closed

ghost locked as resolved and limited conversation to collaborators Feb 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

tannergooding commented Dec 15, 2022

ghost commented Dec 15, 2022

tannergooding commented Dec 15, 2022 •

edited

Loading

tannergooding commented Dec 16, 2022

tannergooding commented Dec 17, 2022 •

edited

Loading

BruceForstall commented Dec 17, 2022

tannergooding commented Dec 17, 2022

tannergooding commented Dec 17, 2022

tannergooding commented Dec 17, 2022 •

edited

Loading

tannergooding commented Dec 17, 2022

tannergooding Dec 20, 2022

tannergooding Dec 20, 2022

tannergooding Dec 20, 2022 •

edited

Loading

tannergooding Dec 20, 2022

tannergooding Dec 20, 2022

tannergooding commented Dec 21, 2022

BruceForstall left a comment

tannergooding commented Dec 28, 2022

azure-pipelines bot commented Dec 28, 2022

tannergooding commented Dec 28, 2022

azure-pipelines bot commented Dec 28, 2022

tannergooding commented Dec 28, 2022

EgorBo commented Jan 3, 2023 •

edited

Loading

Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720

Conversation

tannergooding commented Dec 15, 2022

ghost commented Dec 15, 2022

tannergooding commented Dec 15, 2022 • edited Loading

tannergooding commented Dec 16, 2022

tannergooding commented Dec 17, 2022 • edited Loading

BruceForstall commented Dec 17, 2022

tannergooding commented Dec 17, 2022

tannergooding commented Dec 17, 2022

tannergooding commented Dec 17, 2022 • edited Loading

tannergooding commented Dec 17, 2022

tannergooding Dec 20, 2022

Choose a reason for hiding this comment

tannergooding Dec 20, 2022

Choose a reason for hiding this comment

tannergooding Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

tannergooding Dec 20, 2022

Choose a reason for hiding this comment

tannergooding Dec 20, 2022

Choose a reason for hiding this comment

tannergooding commented Dec 21, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

tannergooding commented Dec 28, 2022

azure-pipelines bot commented Dec 28, 2022

tannergooding commented Dec 28, 2022

azure-pipelines bot commented Dec 28, 2022

tannergooding commented Dec 28, 2022

EgorBo commented Jan 3, 2023 • edited Loading

tannergooding commented Dec 15, 2022 •

edited

Loading

tannergooding commented Dec 17, 2022 •

edited

Loading

tannergooding commented Dec 17, 2022 •

edited

Loading

tannergooding Dec 20, 2022 •

edited

Loading

EgorBo commented Jan 3, 2023 •

edited

Loading