-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move more of the xplat hwintrinsic API implementation into managed code #103150
Conversation
…nt for Vector64/128/256/512 into managed
…r64/128/256/512 into managed
Is this the direction that we want to take with primitive |
} | ||
else if (sizeof(T) == 8) | ||
{ | ||
if (Avx512F.VL.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that this is not going to work well for NAOT. The IsSupported properties are not constants for NAOT, they are dynamic feature checks. It works ok today since these properties are used relatively rarely and protect large chunks of code. It is not going to work well when they are used very frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're only dynamic feature checks for the same encoding levels.
That is, if you target the baseline (x86-64-v1
) then SSE3-SSE4.2
become dynamic checks while Avx+
is static false (due to requiring VEX encoding).
If you target x86-64-v2
then there should be no dynamic checks.
If you target x86-64-v3
then there should be no dynamic checks (Avx512F+
requires EVEX encoding).
If you target x86-64-v4
then you would get dynamic checks for Avx512Vbmi
You could also target individual ISAs, but that's not typically recommended. There are then also some one off ISAs not in the normal hierarchy, such as Aes
and Pclmulqdq
, but those are exceptionally rare. The same would be true of ISAs that are not part of a target baseline but which support an encoding, such as AvxVnni
.
So given the current code here, I expect that NAOT will do just fine. There may be some concern for other areas, but that's likely something we want a feature or attribute to help handle anyways given the same concern applies to user code that is writing small helper methods that do per architecture dispatch.
For S.P.Corelib in particular, it's also worth noting that the trimmer is going to remove the if (AdvSimd.IsSupported)
and similar paths when targeting xarch, so in many cases we're going to get down to just the 4 constant size checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we are lucky that this works given other constraints for typical configurations. It does not work well in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work well in general.
Agreed. I think this is an area we're going to need to think about more in depth for NAOT when it comes time.
In many cases, the dynamic checks aren't worthwhile unless you have some form of code cloning or are executing more complex logic. While the pattern we tend to encourage for users is to write small helpers that isolate the functionality instead (as a way of improving readability and maintainability).
Perhaps that could be some bool IsSupportedDynamically { get; }
(or IsSupportedOpportunistically
?) that allows the developer to be explicit they would like a dynamic check instead of an exact check.
There's been requests for something along the lines of this from both the JIT and Mono teams for a while now. The general consideration is that we currently have to duplicate the xplat acceleration across at least 4 different backends (RyuJIT, MonoLLVM, MonoJIT, and MonoInterpreter). This is error prone and it's fairly trivial for support and/or optimizations to not be done consistently. The support for this tends to be quite a bit more complex and requires very special consideration of where/how the IR is created. We end up having to write significantly more code to support this in the runtimes as it requires manually creating the IR and documenting the rough equivalent C# it corresponds to. There have been multiple occasions where some subtle ordering rule is missed and nodes are accidentally reordered leading to downstream bugs. It being in the JIT also ends up confusing downstream developers as to most C# developers it is a bit of a black box where they cannot see what code is getting generated, cannot easily contribute fixes or improvements, etc. All in all, having the runtime support be centered around the platform specific intrinsics that must be handled independently and the implementation of the xplat surface area be in managed helps things from multiply different angles. The biggest downside is that it requires more inlining to happen to get things right and that's one of the reasons I had pushed back against us moving it into managed for so long. However, with where RyuJIT is today and with the recent improvements to handle CC. @EgorBo, @jakobbotsch who may have additional thoughts on this from the RyuJIT side. |
We have a problem with duplicating every other feature and optimization over different backends, this duplication is one of many. The real fix to this one (and other duplications) would be to switch over to RyuJIT for Mono targets. |
That's fair, but potentially many years out. I think it is ultimately a tradeoff between the work the JIT has to do for common xplat API usage (in terms of inlining and other optimizations) in comparison to the ease at which the code can be versioned, maintained, and is known to be correct (particularly when comparing the accelerated vs the non-accelerated path, which shows up for direct vs indirect invocation). I'm fine with keeping this in the JIT if we think that is overall better, even with the other considerations. If we decided on that, I'd also opt for reverting #102301 and maintaining the status quo for all APIs (although there's a couple managed side fixes made to the fallback path that I'd need to extract out and keep). |
#102301 looks fine to me. The method implementations that #102301 replaced the intrinsic expansions with are typically trivial forwarders and they are less essential APIs. It would be useful to delete the Mono intrinsic expansions that match #102301 to prove that this is viable path to reduce duplication between backend. |
So then would you say that removing the JIT support for Or are you saying you'd only remove the JIT support for places that are trivially deferred? Those ones don't really save much in the JIT, because they share an execution path. That is, if we have to intrinsify |
I would keep all Vector64/128/256/512 and Vector in the JIT. I see these as the most frequently used basic building blocks. I think the most frequently used basic building blocks and operations on them need to be cheap and they should not be eating various JIT budgets unnecessarily. |
Will close this and bring back a little bit of the handling from #102301 then. -- At least the I'll focus on just getting rid of the remaining JIT logic for That will allow us to still remove some logic while keeping the basic building blocks efficiently handled in the JIT. Thanks for the input! |
Anybody plans to look into that? (just curious) |
It's on the backlog of things I'd like to look at. Testing and validating Mono tends to be a bit more work so I had planned on doing the RyuJIT work first, leaving Mono intact, and then take a look at Mono independently. |
This moves the entirety of Vector and a number of core operations from Vector64/128/256/512 into managed