-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Vector256.Dot produces a V256 result #88712
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis resolves #88704
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 4 pipeline(s). |
CC. @dotnet/jit-contrib, this resolves #88704 and the last of the outerloop HWIntrinsic related failures. Remaining jitstress failures are in other areas. |
@@ -4462,29 +4462,30 @@ GenTree* Lowering::LowerHWIntrinsicDot(GenTreeHWIntrinsic* node) | |||
assert(comp->compIsaSupportedDebugOnly(InstructionSet_AVX)); | |||
|
|||
// We will be constructing the following parts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was simply that we were returning a V128
still when we should've been returning a full V256
The old code was basically doing:
vdpps ymm0, ymm1, ymm2, -15
vextractf128 xmm1, ymm0, 1
vaddps xmm0, xmm0, xmm1
The new code instead does:
vdpps ymm0, ymm1, ymm2, -1
vperm2f128 ymm1, ymm0, ymm0, 1
vaddps ymm0, ymm0, ymm1
So we have the same general size of code and are simply swapping lower/upper rather than extracting upper, we then do a 256-bit add rather than a 128-bit add.
In a couple edge cases, we do end up with slightly better code due to how the register allocator handles the uses of the clones and so we see a very small -51 bytes disasm improvement.
|
||
tmp2 = comp->gtNewSimdBinOpNode(GT_ADD, TYP_SIMD16, tmp3, tmp1, simdBaseJitType, 16); | ||
BlockRange().InsertAfter(tmp1, tmp2); | ||
NamedIntrinsic permute2x128 = (simdBaseType == TYP_DOUBLE) ? NI_AVX_Permute2x128 : NI_AVX2_Permute2x128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the debug assert only checks AVX: assert(comp->compIsaSupportedDebugOnly(InstructionSet_AVX));
in this case, should it be updated to avx2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an assert a bit higher up already. We could replicate it here if you really want it to be.
This resolves #88704