Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate what would be required to move some SIMD functionality out of the JIT and into managed code #102275

Closed
tannergooding opened this issue May 15, 2024 · 9 comments
Assignees
Milestone

Comments

@tannergooding
Copy link
Member

Historically a number of types (such as Vector2, Vector3, and Vector4) have had their functionality implemented as Intrinsic in the JIT. This was done for many reasons, but primarily for performance of these types that are known to be used in perf critical workloads since the JIT needed to generate specialized instructions that were otherwise not accessible.

Since then, we've had a broad range of support exposed around platform specific hardware intrinsic APIs and it is now possible to implement many of these other APIs purely in managed code. However, this has not been done due to concerns that it may regress perf. This concern is particularly relevant in complex algorithms where the inliner may decide that it is out of budget and no longer tries to optimize this code.

It would be beneficial to actually prototype the work of moving some of these implementations out of the JIT and purely into managed code. This should let us see what in the JIT may still be getting pessimized and work towards fixing it. The long term benefits of moving the complexity into managed are then substantial:

  • The JIT becomes simpler/smaller
  • Multiple runtimes no longer need to do the work to optimize these additional APIs, they only need to support the xplat APIs
  • Users can more readily debug the code being executed and contribute improvements themselves
    • Such fixes/improvements are also much lower risk
  • Pain points that end users may encounter become more readily identifiable to the team, allowing fixes to be provided
  • etc
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member Author

In an ideal world, we would be able to remove the entirety of simdashwintrinsic from the JIT and much of the handling around the various xplat APIs in hwintrinsiclist* (essentially any NamedIntrinsic which is invalid to represent as a node ID).

@tannergooding
Copy link
Member Author

CC. @EgorBo

@EgorBo
Copy link
Member

EgorBo commented May 15, 2024

I fully support this 🙂

@tannergooding tannergooding self-assigned this May 15, 2024
@tannergooding
Copy link
Member Author

I've assigned myself for the time being, but likely won't get to this until after we snap for .NET 9 RC1 and main opens for .NET 10

If someone else has some time to work on prototyping this sooner, please reach out and we can see about getting it assigned out instead.

@tannergooding
Copy link
Member Author

The prototype overall looked good, however I'm going to block the work on #11062 being resolved as the risk is too high otherwise. -- Not being intrinsic means inlining has to happen which means various constants can't be seen as constant until post import when various other optimizations (like forward sub, constant prop, etc) have happened. This really hurts a number of common existing code patterns and will regress user workloads.

There's some other aspects that likely should be investigated more as well, such as the overhead of the JIT needing to do more work (more inlining, more optimizations) to make things generate the same code and therefore likely reducing the throughput and increasing overhead of the JIT. But, I think those aspects do not need to be blocked on and can instead be handled incrementally.

@tannergooding
Copy link
Member Author

tannergooding commented May 17, 2024

As described in #11062 (comment), we have support for rewriting some nodes (GT_INTRINSIC) back into a call as part of rationalization (the final step before lowering). This works fine when all inputs and return types are primitive.

However, the support does not currently exist to do the more complex fixups and ABI handling that would be required to convert a GT_HWINTRINSIC (which may have TYP_SIMD, TYP_MASK, or even TYP_STRUCT inputs/outputs) back into a call, nor do we currently have a node capable of tracking all the required information (like the method handle) to make that transform possible.

@tannergooding
Copy link
Member Author

I took another look at allowing GT_HWINTRINSIC to carry the CORINFO_METHOD_HANDLE down to rationalization and tried to update things to handle it all.

The big things to note are that such signatures require us to re-resolve the CORINFO_SIG_INFO so we can get the CORINFO_CLASS_HANDLE for each argument so we can correctly NewCallArg::Struct(...) for TYP_SIMD operands and fgMorphArgs appears to do basically all the right fixups and such for input parameters

However, what's not handled is the ABI handling for TYP_SIMD returns which currently looks to be mostly isolated to impFixupCallStructReturn (which changes it from TYP_STRUCT to TYP_SIMD and such) but there's then more handling which seems to be scattered about import as well that does the fixup to insert WellKnownArg::RetBuffer, change the return type to actually be TYP_VOID, allocate a local to hold the return buffer, and replaces the user of the call with the the return buffer so that the value can actually be consumed

Refactoring the last stuff to not be import centric and to allow it to be shared with rationalization was a bit out of my depth; I tried hacking some bits together and got it mostly working, but its not something that would ever pass code review 😄

Also worth noting that for input args it wasn't entirely clear whether fgMorphArgs was doing the right thing with regards to shadow copying values, it looked like it might not have been and that was instead handled separately in import, so that may be another place that needs logic to be centralized so that rationalization (or even morph) could reuse it for these intrinsic calls that involve non primitive types

@tannergooding
Copy link
Member Author

tannergooding commented Aug 8, 2024

Closing this.

The investigation was done and it was decided that while non-core types like Vector2/3/4, Plane, Quaternion, and Matrix4x4 could be implemented in managed. We want to keep ABI primitive types like Vector<T> and Vector64/128/256/512<T> accelerated directly in the JIT to ensure everything gets the necessary perf and can be relied upon to build the non-core types.

.NET 9 has correspondingly moved the implementation of types like Vector2/3/4, etc entirely into managed.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants