Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

tannergooding · 2024-05-14T03:10:06Z

This resolves #98053 and makes progress towards #93513

…tor types

…e vector types

dotnet-issue-labeler · 2024-05-14T03:10:12Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

tannergooding · 2024-05-15T16:50:44Z

CC. @dotnet/jit-contrib for the JIT side.

This is one of the few remaining cases that should need JIT side work, most of the remaining APIs in #93513 will be implemented purely in managed code. The *Estimate APIs are special as they need to work in the face of R2R due to being intentionally non-deterministic across hardware and FusedMultiplyAdd represents a case that optimizes down to a single instruction and which is a core part of many already deeply complex algorithms, so the benefits of guaranteeing expansion make it good to hook up to the existing intrinsic support.

EgorBo · 2024-05-15T16:57:51Z

CC. @dotnet/jit-contrib for the JIT side.

This is one of the few remaining cases that should need JIT side work, most of the remaining APIs in #93513 will be implemented purely in managed code. The *Estimate APIs are special as they need to work in the face of R2R due to being intentionally non-deterministic across hardware and FusedMultiplyAdd represents a case that optimizes down to a single instruction and which is a core part of many already deeply complex algorithms, so the benefits of guaranteeing expansion make it good to hook up to the existing intrinsic support.

Should we rather introduce a new attribute to prevent methods from being pre-jitted (and inlined into pre-jitted code)?

EgorBo · 2024-05-15T16:59:51Z

src/coreclr/jit/gentree.cpp

+    // AdvSimd.FusedMultiplyAdd expects (addend, left, right), while the APIs take (left, right, addend)
+    // We expect op1 and op2 to have already been spilled
+
+    std::swap(op1, op3);


I am just curious - who's responsible to spill them?

Ah, nvm, I see

Typically it's the importer code, prior to calling this method. Latter phases that might call such an API (like morph) are responsible for ensuring the swap is safe in the face of potential side effects.

There's, unfortunately, not really a way for us to do such a swap safely in the API itself (at least that I know of) nor to know if the caller has actually done it. So the typical approach has been to do the swap and comment that callers should be doing the validation.

EgorBo · 2024-05-15T17:04:13Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            impSpillSideEffect(true, verCurrentState.esStackDepth -
+                                         3 DEBUGARG("Spilling op1 side effects for MultiplyAddEstimate"));
+
+            impSpillSideEffect(true, verCurrentState.esStackDepth -


Isn't better/simpler to use impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("spilling side-effects")); ?

It's not something we've been doing in other scenarios.

AFAIR it comes down to not spilling values on the stack that aren't impacted. For example, we take 3 args, but the stack could have 4+ on it (spilling these is unnecessary since they are still processed in order with respect to our own op1/op2/op3; otherwise everyone who pops the stack would need to consider the need to spill such additional entries) and we don't need to ever spill the stack top (op3).

So we're doing this to ensure only the minimum number of items that need to be spilled are spilled.

tannergooding · 2024-05-15T17:11:11Z

Should we rather introduce a new attribute to prevent methods from being pre-jitted (and inlined into pre-jitted code)?

We technically have one already, it's BypassReadyToRun (but we don't define or use it anyways as it was a legacy attribute for Crossgen1). The PR where I fixed this for Math.ReciprocalSqrtEstimate it was suggested to do it this other way instead.

I'm also not sure the number of *Estimate APIs is at the point where it's beneficial to define and expose that attribute and there's various other quirks that may need to be considered/solved in that scenario was well, plus the general consideration that some of these APIs are known to be used in hot paths of complex algorithms where inlining can fail, even with the inlining boost that methods using intrinsics get.

-- I do want to get to a point, someday, where we can move a lot of complexity out of the JIT and into the BCL instead. Ideally just the platform specific intrinsics and core xplat APIs which were exposed instead of platform specific APIs (like Create) are handled in the JIT and things like Vector2/3/4 or Vector128.WidenLower and APIs like that are handled entirely in managed code instead. I just don't think we're at that point yet today and need more work in the inliner and other places to ensure that we can get the desired codegen.

tannergooding · 2024-05-15T17:20:29Z

CC. @stephentoub for the libraries side.

The one thing to note is that given how TensorPrimitives defines MultiplyAddEstimate and uses it in other APIs, the API was moved down to INumberBase<T> rather than being put on IFloatingPointIeee754<T>. This is still sensible for other types, it just always does the naive thing and isn't exposed as a member that's accessible from non-generic contexts.

tannergooding · 2024-05-15T17:35:48Z

-- I do want to get to a point, someday, where we can move a lot of complexity out of the JIT and into the BCL instead. Ideally just the platform specific intrinsics and core xplat APIs which were exposed instead of platform specific APIs (like Create) are handled in the JIT and things like Vector2/3/4 or Vector128.WidenLower and APIs like that are handled entirely in managed code instead. I just don't think we're at that point yet today and need more work in the inliner and other places to ensure that we can get the desired codegen.

I've logged #102275 to track prototyping this work and seeing if we can identify some of the places that are still preventing it from happening.

This seems like a good thing to try and do around the time .NET 9 RC1 snaps and we have a bit of a small break before .NET 10 work really gets going.

…vector and scalar types (dotnet#102181) * Expose FusedMultiplyAdd and MultiplyAddEstimate on the scalar and vector types * Adding tests covering FusedMultiplyAdd and MultiplyAddEstimate for the vector types * Ensure TensorPrimitives uses the xplat APIs on .NET 9+ * Apply formatting patch * Fix an accidental change to GenericVectorTests * Ensure Arm64 passes fma operands in the correct order * Apply formatting patch * Ensure all the Arm64 code paths are spilling and swapping operands * Apply formatting patch * Don't pop the stack value twice

tannergooding added 3 commits May 13, 2024 19:59

Expose FusedMultiplyAdd and MultiplyAddEstimate on the scalar and vec…

46caa78

…tor types

Adding tests covering FusedMultiplyAdd and MultiplyAddEstimate for th…

940862e

…e vector types

Ensure TensorPrimitives uses the xplat APIs on .NET 9+

d5b2ad9

dotnet-issue-labeler bot added area-System.Numerics new-api-needs-documentation labels May 14, 2024

dotnet-policy-service bot assigned tannergooding May 14, 2024

tannergooding added 2 commits May 13, 2024 22:57

Apply formatting patch

731258f

Fix an accidental change to GenericVectorTests

4010235

This was referenced May 14, 2024

[Android] No attached devices found dotnet/dnceng#2057

Open

Dead lettering tests #101524

Closed

tannergooding added 4 commits May 14, 2024 08:01

Ensure Arm64 passes fma operands in the correct order

8a60cff

Apply formatting patch

5e1e5eb

Ensure all the Arm64 code paths are spilling and swapping operands

85425c2

Apply formatting patch

0896be6

tannergooding marked this pull request as draft May 14, 2024 18:31

Don't pop the stack value twice

fdd2a83

tannergooding marked this pull request as ready for review May 15, 2024 16:47

EgorBo approved these changes May 15, 2024

View reviewed changes

EgorBo reviewed May 15, 2024

View reviewed changes

tannergooding merged commit b7727f5 into dotnet:main May 16, 2024
183 checks passed

tannergooding deleted the fix-93513 branch May 16, 2024 21:11

github-actions bot locked and limited conversation to collaborators Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

tannergooding commented May 14, 2024

dotnet-issue-labeler bot commented May 14, 2024

tannergooding commented May 15, 2024

EgorBo commented May 15, 2024

EgorBo May 15, 2024

EgorBo May 15, 2024

tannergooding May 15, 2024

EgorBo May 15, 2024

tannergooding May 15, 2024

tannergooding commented May 15, 2024

tannergooding commented May 15, 2024

tannergooding commented May 15, 2024

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

Conversation

tannergooding commented May 14, 2024

dotnet-issue-labeler bot commented May 14, 2024

tannergooding commented May 15, 2024

EgorBo commented May 15, 2024

EgorBo May 15, 2024

Choose a reason for hiding this comment

EgorBo May 15, 2024

Choose a reason for hiding this comment

tannergooding May 15, 2024

Choose a reason for hiding this comment

EgorBo May 15, 2024

Choose a reason for hiding this comment

tannergooding May 15, 2024

Choose a reason for hiding this comment

tannergooding commented May 15, 2024

tannergooding commented May 15, 2024

tannergooding commented May 15, 2024