JIT: Consume FMA intrinsic operands in right order #102914

jakobbotsch · 2024-05-31T11:00:26Z

The operands of the FMA intrinsics are permuted in a non-standard way during LSRA. Codegen already takes this into account, but the handling was missing when consuming the operands.

Ideally we would permute these during lowering instead to avoid these hacks.

Fix #102773

The operands of the FMA intrinsic are permuted in a non-standard way during LSRA. Codegen already takes this into account, but the handling was missing when consuming the operands. Ideally we would permute these during lowering instead to avoid these hacks. Fix dotnet#102773

dotnet-policy-service · 2024-05-31T11:00:58Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jakobbotsch · 2024-05-31T15:10:54Z

cc @dotnet/jit-contrib PTAL @tannergooding @kunalspathak

No diffs

This fixes the superpmi-replay blocking issue.

tannergooding · 2024-05-31T15:55:35Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

+#ifdef DEBUG
+    // Use nums are assigned in LIR order but this node is special and doesn't
+    // actually use operands. Fix up the use nums here to avoid asserts.
+    unsigned useNum1  = op1->gtUseNum;
+    unsigned useNum2  = op2->gtUseNum;
+    unsigned useNum3  = op3->gtUseNum;
+    emitOp1->gtUseNum = useNum1;
+    emitOp2->gtUseNum = useNum2;
+    emitOp3->gtUseNum = useNum3;
+#endif
+
+    genConsumeRegs(emitOp1);
+    genConsumeRegs(emitOp2);
+    genConsumeRegs(emitOp3);
+


Ideally we would permute these during lowering instead to avoid these hacks.

It's worth noting the reason we didn't do this in lowering (unlike most of the other cases that do swap operands) is because it would require introducing a lot of new synthetic intrinsics and it was believed to be overall more costly.

Each FusedMultiplyAdd intrinsic has three forms, where op3 is always the node that can be optionally contained:

132 - op1 = (op1 * op3) + op2

213 - `op1 = (op2 * op1) + op3

231 - op1 = (op2 * op3) + op1

The managed API we expose is the 213 form and there are 10 different FMA intrinsics, so we'd need to expose 10 more for the 132 and 10 more for the 231 form. Then we'd need to repeat this for the Avx512 specific variants, giving us at least 50 new synthetic intrinsics in lowering just to cover FMA.

We might be able to avoid new synthetic intrinsics if we had a way to track what permutation it was, but free bits are fairly sparse right now. So I think we'd need to get clever in how we tracked that.

We cannot use genConsumeMultiOpOperands() here instead?
Edit: I assume because we won't be using the same order as swapped operands?

Right, op1, op2, op3 (the order that genConsumeMultiOpOperands consumes in) is not the same as emitOp1, emitOp2, emitOp3, which as I understand it is the order that uses were built in by LSRA. We should consume in that order.

I don't have the context necessary to completely understand why we can't build and consume the operands in the op1, op2, op3 order even if we end up emitting different instructions using the registers in different orders in the instruction we emit.

@tannergooding - do you know?

That I don't. It's an area of the register allocator I'm not well versed in.

That I don't. It's an area of the register allocator I'm not well versed in.

Hhm, I think it is to do with how we are consuming them in codegen as opposed to the LSRA ordering.

and consume the operands in the op1, op2, op3 order even if we end up emitting different instructions using the registers in different order

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 31, 2024

dotnet-policy-service bot assigned jakobbotsch May 31, 2024

jakobbotsch mentioned this pull request May 31, 2024

SPMI Replay failing sporadically #102773

Closed

jakobbotsch marked this pull request as ready for review May 31, 2024 15:08

jakobbotsch requested review from tannergooding and kunalspathak May 31, 2024 15:11

tannergooding approved these changes May 31, 2024

View reviewed changes

tannergooding reviewed May 31, 2024

View reviewed changes

build-analysis bot mentioned this pull request May 31, 2024

Test failure: GC\\Features\\HeapExpansion\\Finalizer\\Finalizer.cmd #102706

Closed

jakobbotsch merged commit e9cd3f1 into dotnet:main May 31, 2024
112 of 114 checks passed

jakobbotsch deleted the fix-102773 branch May 31, 2024 18:59

github-actions bot locked and limited conversation to collaborators Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Consume FMA intrinsic operands in right order #102914

JIT: Consume FMA intrinsic operands in right order #102914

jakobbotsch commented May 31, 2024 •

edited

Loading

dotnet-policy-service bot commented May 31, 2024

jakobbotsch commented May 31, 2024

tannergooding May 31, 2024

tannergooding May 31, 2024

kunalspathak May 31, 2024 •

edited

Loading

jakobbotsch May 31, 2024

kunalspathak May 31, 2024

tannergooding May 31, 2024

kunalspathak May 31, 2024

JIT: Consume FMA intrinsic operands in right order #102914

JIT: Consume FMA intrinsic operands in right order #102914

Conversation

jakobbotsch commented May 31, 2024 • edited Loading

dotnet-policy-service bot commented May 31, 2024

jakobbotsch commented May 31, 2024

tannergooding May 31, 2024

Choose a reason for hiding this comment

tannergooding May 31, 2024

Choose a reason for hiding this comment

kunalspathak May 31, 2024 • edited Loading

Choose a reason for hiding this comment

jakobbotsch May 31, 2024

Choose a reason for hiding this comment

kunalspathak May 31, 2024

Choose a reason for hiding this comment

tannergooding May 31, 2024

Choose a reason for hiding this comment

kunalspathak May 31, 2024

Choose a reason for hiding this comment

jakobbotsch commented May 31, 2024 •

edited

Loading

kunalspathak May 31, 2024 •

edited

Loading