Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move more of the xplat hwintrinsic API implementation into managed code #103150

Closed
wants to merge 12 commits into from

Conversation

tannergooding
Copy link
Member

This moves the entirety of Vector and a number of core operations from Vector64/128/256/512 into managed

@jkotas
Copy link
Member

jkotas commented Jun 7, 2024

Is this the direction that we want to take with primitive Vector* operations? It does not feel right looking at the amounts of IL that these operations expand into with this change.

}
else if (sizeof(T) == 8)
{
if (Avx512F.VL.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this is not going to work well for NAOT. The IsSupported properties are not constants for NAOT, they are dynamic feature checks. It works ok today since these properties are used relatively rarely and protect large chunks of code. It is not going to work well when they are used very frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're only dynamic feature checks for the same encoding levels.

That is, if you target the baseline (x86-64-v1) then SSE3-SSE4.2 become dynamic checks while Avx+ is static false (due to requiring VEX encoding).
If you target x86-64-v2 then there should be no dynamic checks.
If you target x86-64-v3 then there should be no dynamic checks (Avx512F+ requires EVEX encoding).
If you target x86-64-v4 then you would get dynamic checks for Avx512Vbmi

You could also target individual ISAs, but that's not typically recommended. There are then also some one off ISAs not in the normal hierarchy, such as Aes and Pclmulqdq, but those are exceptionally rare. The same would be true of ISAs that are not part of a target baseline but which support an encoding, such as AvxVnni.

So given the current code here, I expect that NAOT will do just fine. There may be some concern for other areas, but that's likely something we want a feature or attribute to help handle anyways given the same concern applies to user code that is writing small helper methods that do per architecture dispatch.

For S.P.Corelib in particular, it's also worth noting that the trimmer is going to remove the if (AdvSimd.IsSupported) and similar paths when targeting xarch, so in many cases we're going to get down to just the 4 constant size checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we are lucky that this works given other constraints for typical configurations. It does not work well in general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work well in general.

Agreed. I think this is an area we're going to need to think about more in depth for NAOT when it comes time.

In many cases, the dynamic checks aren't worthwhile unless you have some form of code cloning or are executing more complex logic. While the pattern we tend to encourage for users is to write small helpers that isolate the functionality instead (as a way of improving readability and maintainability).

Perhaps that could be some bool IsSupportedDynamically { get; } (or IsSupportedOpportunistically?) that allows the developer to be explicit they would like a dynamic check instead of an exact check.

@tannergooding
Copy link
Member Author

Is this the direction that we want to take with primitive Vector* operations? It does not feel right looking at the amounts of IL that these operations expand into with this change.

There's been requests for something along the lines of this from both the JIT and Mono teams for a while now.

The general consideration is that we currently have to duplicate the xplat acceleration across at least 4 different backends (RyuJIT, MonoLLVM, MonoJIT, and MonoInterpreter). This is error prone and it's fairly trivial for support and/or optimizations to not be done consistently.

The support for this tends to be quite a bit more complex and requires very special consideration of where/how the IR is created. We end up having to write significantly more code to support this in the runtimes as it requires manually creating the IR and documenting the rough equivalent C# it corresponds to. There have been multiple occasions where some subtle ordering rule is missed and nodes are accidentally reordered leading to downstream bugs.

It being in the JIT also ends up confusing downstream developers as to most C# developers it is a bit of a black box where they cannot see what code is getting generated, cannot easily contribute fixes or improvements, etc.

All in all, having the runtime support be centered around the platform specific intrinsics that must be handled independently and the implementation of the xplat surface area be in managed helps things from multiply different angles. The biggest downside is that it requires more inlining to happen to get things right and that's one of the reasons I had pushed back against us moving it into managed for so long. However, with where RyuJIT is today and with the recent improvements to handle ConstantExpected scenarios across inlining boundaries now, we're at a point where it is feasible and hence why I've put up the first couple PRs to experiment in this space.

CC. @EgorBo, @jakobbotsch who may have additional thoughts on this from the RyuJIT side.

@jkotas
Copy link
Member

jkotas commented Jun 7, 2024

The general consideration is that we currently have to duplicate the xplat acceleration across at least 4 different backends (RyuJIT, MonoLLVM, MonoJIT, and MonoInterpreter). This is error prone and it's fairly trivial for support and/or optimizations to not be done consistently.

We have a problem with duplicating every other feature and optimization over different backends, this duplication is one of many. The real fix to this one (and other duplications) would be to switch over to RyuJIT for Mono targets.

@tannergooding
Copy link
Member Author

The real fix to this one (and other duplications) would be to switch over to RyuJIT for Mono targets.

That's fair, but potentially many years out.

I think it is ultimately a tradeoff between the work the JIT has to do for common xplat API usage (in terms of inlining and other optimizations) in comparison to the ease at which the code can be versioned, maintained, and is known to be correct (particularly when comparing the accelerated vs the non-accelerated path, which shows up for direct vs indirect invocation).

I'm fine with keeping this in the JIT if we think that is overall better, even with the other considerations. If we decided on that, I'd also opt for reverting #102301 and maintaining the status quo for all APIs (although there's a couple managed side fixes made to the fallback path that I'd need to extract out and keep).

@jkotas
Copy link
Member

jkotas commented Jun 7, 2024

I'd also opt for reverting #102301 and maintaining the status quo for all APIs

#102301 looks fine to me. The method implementations that #102301 replaced the intrinsic expansions with are typically trivial forwarders and they are less essential APIs. It would be useful to delete the Mono intrinsic expansions that match #102301 to prove that this is viable path to reduce duplication between backend.

@tannergooding
Copy link
Member Author

#102301 looks fine to me. The method implementations that #102301 replaced the intrinsic expansions with are typically trivial forwarders and they are less essential APIs. It would be useful to delete the Mono intrinsic expansions that match #102301 to prove that this is viable path to reduce duplication between backend.

So then would you say that removing the JIT support for Vector2/3/4, Quaternion, and Plane is fine. But you'd keep the JIT implementation for Vector64/128/256/512<T>? What about Vector<T>?

Or are you saying you'd only remove the JIT support for places that are trivially deferred? Those ones don't really save much in the JIT, because they share an execution path. That is, if we have to intrinsify op_Addition in the JIT, then intrinsifying Add makes sense as well because its the same code path, so we avoid the inlining and improve throughput by just making the same switch case handle both.

@jkotas
Copy link
Member

jkotas commented Jun 7, 2024

you'd keep the JIT implementation for Vector64/128/256/512? What about Vector?

I would keep all Vector64/128/256/512 and Vector in the JIT. I see these as the most frequently used basic building blocks. I think the most frequently used basic building blocks and operations on them need to be cheap and they should not be eating various JIT budgets unnecessarily.

@tannergooding
Copy link
Member Author

tannergooding commented Jun 7, 2024

I would keep all Vector64/128/256/512 and Vector in the JIT.

Will close this and bring back a little bit of the handling from #102301 then. -- At least the As and get_Zero methods since those are very common and no-ops, I could also bring back the named alternatives like BitwiseAnd, but those are less common than the operators and so less problematic since they aren't as much "building blocks".

I'll focus on just getting rid of the remaining JIT logic for Vector2/3 instead then, since those are not core helpers and can just be AsVector128(), op, AsVector2/3() like Vector4/Quaternion/Plane were simplified down to.

That will allow us to still remove some logic while keeping the basic building blocks efficiently handled in the JIT.

Thanks for the input!

@jkotas
Copy link
Member

jkotas commented Jun 7, 2024

delete the Mono intrinsic expansions that match #102301

Anybody plans to look into that? (just curious)

@tannergooding
Copy link
Member Author

Anybody plans to look into that? (just curious)

It's on the backlog of things I'd like to look at. Testing and validating Mono tends to be a bit more work so I had planned on doing the RyuJIT work first, leaving Mono intact, and then take a look at Mono independently.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants