-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Unix to save/restore Avx512 state #83784
Conversation
7cb1fb6
to
5f41f8d
Compare
5f41f8d
to
aa6e6b2
Compare
8dd3e06
to
cf91641
Compare
This updates the pal It tries to keep the AVX512 state "pay for play", effectively only costing an additional branch on hardware without support. I opted to not "exactly" mirror the Win32 API surface. That is, I don't expose or use things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Marked I've not been able to repro locally yet and so I'm trying to root cause it in CI at the moment. |
b50270e
to
7ad3f68
Compare
6b52fac
to
9c841a0
Compare
@janvorli, this could use one more small review pass if you could. I've consolidated the changes since your last review into one additional commit: 9c841a0 The summary is that most of the changes are small fixes and probably don't need "extra scrutiny". However, the logic required to support OSX in particular had to change quite a bit. In particular the changes are:
I attempted to ensure the relevant areas have explanatory comments/links and that the overall logic is shared where feasible. |
f13f332
to
ac62da6
Compare
// | ||
// See https://github.com/apple/darwin-xnu/blob/main/osfmk/i386/fpu.c#L174 | ||
|
||
// TODO-AVX512: Enabling this for OSX requires ensuring threads explicitly trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it sounds like that it is not worth the troubles to enable AVX512 for OSX x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's certainly possible. I think we do need some comment tracking/explaining the general reason why its not supported however so the only other change I could see is removing the commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to remove the commented out code. I still left the general TODO, but its likely not something we'll actively focus on supporting given the general move of OSX to Arm64.
OSX may have similar requirements for SVE, however, so if the work eventually happens, it should still be possible to enable in the future.
Finally all green, 🎉 OSX was a lot more problematic than expected 😅 |
@@ -682,8 +755,34 @@ void CONTEXTToNativeContext(CONST CONTEXT *lpContext, native_context_t *native) | |||
#if defined(HOST_AMD64) && defined(XSTATE_SUPPORTED) | |||
if ((lpContext->ContextFlags & CONTEXT_XSTATE) == CONTEXT_XSTATE) | |||
{ | |||
_ASSERTE(FPREG_HasYmmRegisters(native)); | |||
memcpy_s(FPREG_Xstate_Ymmh(native), sizeof(M128A) * 16, lpContext->VectorRegister, sizeof(M128A) * 16); | |||
if (FPREG_HasYmmRegisters(native)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the case when XSTATE is enabled and YMM registers were not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the current implementation perspective, it shouldn't happen.
From the hardware and correctness perspective, xstate
carries a lot of information and the YMM
area is considered part of the optional extended state. It's safer and overall better to be consistent in our checks, IMO (particularly when the check is cheap compared to the overall thread suspend/restore logic).
I think if we want this to be performant, then we'd be better off focusing on transitioning the various logic to use the "native" suspend/restore mechanism for a given platform directly.
Logged #83983 to track the additional MacOS save/restore support. It's possible the reason this was still failing is due to a bad interaction with our own hardware exception handler logic so we'll do some minimal additional investigation to see if that is the case and if AVX-512 can be enabled for MacOS. @janvorli, anything else needed here or is this good to be merged now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Thanks! |
This resolves #81846