-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move remaining HIR SIMDIntrinsics to SimdAsHWIntrinsic #79720
Conversation
…rnal as they are dead
…are contiguous in memory
…e handled in managed code exclusively
…nd remove impSIMDIntrinsic
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis moves the remaining legacy SIMD intrinsics ( As part of this, it means we are able to delete There are still two legacy SIMD intrinsics which exist in LIR. In particular these are
|
Worth noting this is not expected to be zero diff, just positive diffs. In particular, Some simple diffs from these changes are below. We'll correctly contain this when no side effects interfere: - movzx rax, word ptr [rsp+04H]
- vmovd xmm0, eax
- vpbroadcastw ymm0, ymm0
+ vpbroadcastw ymm0, word ptr [rsp+04H] We'll emit - vxorps xmm3, xmm3
- vmovss xmm3, xmm3, xmm2
- vpslldq xmm3, 4
- vmovss xmm3, xmm3, xmm6
- vpslldq xmm3, 4
- vmovss xmm3, xmm3, xmm5
- vmovaps xmm2, xmm3
+ vinsertps xmm5, xmm5, xmm6, 16
+ vinsertps xmm2, xmm5, xmm2, 40 We'll recognize and contain - vxorps xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, xmm0, 56 We'll also fold chains of vmovss xmm0, dword ptr [rsi+30H]
vinsertps xmm0, xmm0, dword ptr [rsi+34H], 16
- vinsertps xmm0, xmm0, dword ptr [rsi+38H], 32
- vxorps xmm1, xmm1, xmm1
- vinsertps xmm0, xmm0, xmm1, 48
+ vinsertps xmm0, xmm0, dword ptr [rsi+38H], 40 |
Raised the aliasing bug on #72725 and asked the fix be reopened. |
CC. @dotnet/jit-contrib. This removes the last bit of the legacy SIMD handling that existed in HIR. It deletes a good amount of dead code and sets us up to move the last two remaining It provides a small throughput win and some decent size wins for x64, Arm64, and x86. I'm not quite ure why SPMI is complaining. I noticed that locally |
Looks like you have a crash in the Linux/x64 cross-compiler replays, both in the superpmi-diffs and superpmi-replay pipelines. E.g., in the replay pipeline log, cases of:
in the diffs log, e.g.:
To repro, run something like this under the debugger (fix the paths first):
We do expect some MISSING cases as the JIT evolves. You might be introducing more if you're changing calls across the JIT/EE interface. In fact, the summary page shows that is the case. |
Thanks for the info @BruceForstall. I did run I'll try to see if I can repro the SPMI Linux/x64 failure as well. |
hmm, I ran:
And am seeing the following counts of giving replay failures:
Nothing outside the missing context ones though. Perhaps I fixed it by resolving the other two issues I had seen... |
Was able to get a local repro, it was The actual assert was:
Issue ended up being the lowering logic around |
CC. @dotnet/jit-contrib ready for review now with the SPMI issue resolved. Saving ~5.2k overall bytes on Arm64 and ~11.5k overall bytes on x64, with most of that being in MinOpts for each of them. Likewise with a -0.02% overall improvement to throughput, its higher and up to -0.05% in MinOpts. |
void genSIMDScalarMove( | ||
var_types targetType, var_types type, regNumber target, regNumber src, SIMDScalarMoveType moveType); | ||
void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg); | ||
void genSIMDIntrinsicInitN(GenTreeSIMD* simdNode); | ||
void genSIMDIntrinsicUpperSave(GenTreeSIMD* simdNode); | ||
void genSIMDIntrinsicUpperRestore(GenTreeSIMD* simdNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two only exist in LIR and are created by LSRA
regNumber targetReg); | ||
void genSIMDIntrinsic32BitConvert(GenTreeSIMD* simdNode); | ||
void genSIMDIntrinsic64BitConvert(GenTreeSIMD* simdNode); | ||
void genSIMDExtractUpperHalf(GenTreeSIMD* simdNode, regNumber srcReg, regNumber tgtReg); | ||
void genSIMDIntrinsic(GenTreeSIMD* simdNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go away once we handle UpperSave/UpperRestore in a follow up PR.
// TODO-CQ: We don't handle contiguous args for anything except TYP_FLOAT today | ||
|
||
GenTree* prevArg = nullptr; | ||
bool areArgsContiguous = (simdBaseType == TYP_FLOAT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing handling in areArgumentsContiguous
and elsewhere only checks and supports TYP_FLOAT
today.
It shouldn't be difficult to extend to other regular types (TYP_INT
, TYP_UINT
, TYP_LONG
, TYP_ULONG
, and TYP_DOUBLE
). It might require a bit more work to extend to small types (TYP_BYTE
, TYP_UBYTE
, TYP_SHORT
, and TYP_USHORT
).
However, I'm leaving it to a follow up PR to minimize the churn here and since it doesn't impact the System.Numerics
types that were handled by impSimdIntrinsic
.
CORINFO_ARG_LIST_HANDLE arg1 = sig->args; | ||
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1); | ||
var_types argType = TYP_UNKNOWN; | ||
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE; | ||
|
||
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass))); | ||
op2 = getArgForHWIntrinsic(argType, argClass); | ||
|
||
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg1, &argClass))); | ||
op1 = getArgForHWIntrinsic(argType, argClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a pre-existing assert that was being triggered by operator /(Vector128<T> vector, T scalar)
} | ||
assert(intrinsicId == NI_SSE41_Insert); | ||
|
||
// We have Sse41.Insert in which case we can specially handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling added here ensures we get great codegen for some of the patterns that had popped up for Vector2/3/4.
It will also help with the System.Runtime.Intrinsics handling when zero is involved.
Merging in dotnet/main to resolve the unrelated CI failure. This is still ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Run outerloop? jitstress? ISAs jitstress?
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
jitstress failures are #75244 |
Improvements on windows-x64:
Linux-x64: windows-arm64: |
This moves the remaining legacy SIMD intrinsics (
GT_SIMD
) that exist in HIR to be implemented usingSimdAsHWIntrinsic
instead.As part of this, it means we are able to delete
impSIMDIntrinsic
.There are still two legacy SIMD intrinsics which exist in LIR. In particular these are
SIMDIntrinsicUpperSave
andSIMDIntrinsicUpperRestore
. After this PR goes in, I plan on putting up one last PR to move these to beNamedIntrinsic
(they cannot beSimdAsHWIntrinsic
because they exist even when FEATURE_HW_INTRINSIC is not defined in order to support the ABI). This will allow us to deleteGT_SIMD
and any remaining support that was specific to the legacy SIMD logic.