Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Vector256.Dot produces a V256 result #88712

Merged
merged 2 commits into from
Jul 13, 2023

Conversation

tannergooding
Copy link
Member

This resolves #88704

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 12, 2023
@ghost ghost assigned tannergooding Jul 12, 2023
@ghost
Copy link

ghost commented Jul 12, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #88704

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, this resolves #88704 and the last of the outerloop HWIntrinsic related failures. Remaining jitstress failures are in other areas.

@@ -4462,29 +4462,30 @@ GenTree* Lowering::LowerHWIntrinsicDot(GenTreeHWIntrinsic* node)
assert(comp->compIsaSupportedDebugOnly(InstructionSet_AVX));

// We will be constructing the following parts:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was simply that we were returning a V128 still when we should've been returning a full V256

The old code was basically doing:

vdpps    ymm0, ymm1, ymm2, -15
vextractf128 xmm1, ymm0, 1
vaddps xmm0, xmm0, xmm1

The new code instead does:

vdpps    ymm0, ymm1, ymm2, -1
vperm2f128 ymm1, ymm0, ymm0, 1
vaddps ymm0, ymm0, ymm1

So we have the same general size of code and are simply swapping lower/upper rather than extracting upper, we then do a 256-bit add rather than a 128-bit add.

In a couple edge cases, we do end up with slightly better code due to how the register allocator handles the uses of the clones and so we see a very small -51 bytes disasm improvement.


tmp2 = comp->gtNewSimdBinOpNode(GT_ADD, TYP_SIMD16, tmp3, tmp1, simdBaseJitType, 16);
BlockRange().InsertAfter(tmp1, tmp2);
NamedIntrinsic permute2x128 = (simdBaseType == TYP_DOUBLE) ? NI_AVX_Permute2x128 : NI_AVX2_Permute2x128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the debug assert only checks AVX: assert(comp->compIsaSupportedDebugOnly(InstructionSet_AVX)); in this case, should it be updated to avx2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an assert a bit higher up already. We could replicate it here if you really want it to be.

@tannergooding tannergooding merged commit 620bd3e into dotnet:main Jul 13, 2023
@tannergooding tannergooding deleted the fix-88704 branch July 13, 2023 16:12
@ghost ghost locked as resolved and limited conversation to collaborators Aug 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VectorDot test failes in codegen
2 participants