Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Unix to save/restore Avx512 state #83784

Merged
merged 17 commits into from
Mar 27, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Mar 22, 2023

This resolves #81846

@tannergooding
Copy link
Member Author

CC. @jkotas, @janvorli

This updates the pal CONTEXT struct to track the necessary additional fields and to populate/restore them as appropriate.

It tries to keep the AVX512 state "pay for play", effectively only costing an additional branch on hardware without support.

I opted to not "exactly" mirror the Win32 API surface. That is, I don't expose or use things like LocateXStateFeature on Unix. Instead, I directly appended these to the end of CONTEXT (rather than having them existing "implicitly") and just consume the relevant fields directly where required.

src/coreclr/pal/src/include/pal/context.h Outdated Show resolved Hide resolved
src/coreclr/pal/src/include/pal/context.h Outdated Show resolved Hide resolved
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@tannergooding
Copy link
Member Author

Marked NO-MERGE while I try to figure out the OSX "invalid instruction" failure.

I've not been able to repro locally yet and so I'm trying to root cause it in CI at the moment.

@tannergooding
Copy link
Member Author

tannergooding commented Mar 24, 2023

@janvorli, this could use one more small review pass if you could.

I've consolidated the changes since your last review into one additional commit: 9c841a0

The summary is that most of the changes are small fixes and probably don't need "extra scrutiny". However, the logic required to support OSX in particular had to change quite a bit.

In particular the changes are:

  • The built-in __cpuid on Unix does not explicitly clear ECX. The MSVC intrinsic does, a couple callsites had to be updated to account for this and to use __cpuidex instead so the behavior is consistent on Unix.
  • The alignment of the Zmm0H and Zmm16 fields needed to be changed as there are several spots relying on CONTEXT to have 16 byte alignment. Refactoring this to support 32 or 64-byte alignment was going to be non-trivial.
  • MacOS does its AVX-512 enablement differently from everyone else and we had to change the context restore logic as well as the general querying logic to account for this. It is covered in more detail under https://github.com/apple/darwin-xnu/blob/main/osfmk/i386/fpu.c#L174
    • Ended up commenting this bit out and changing OSX to return constant false. There is still some edge case where AVX512 enablement isn't working correctly and a general consideration that managed threads likely need to explicitly opt-in to ensure user code doesn't fault on first use

I attempted to ensure the relevant areas have explanatory comments/links and that the overall logic is shared where feasible.

@tannergooding tannergooding removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 24, 2023
//
// See https://github.com/apple/darwin-xnu/blob/main/osfmk/i386/fpu.c#L174

// TODO-AVX512: Enabling this for OSX requires ensuring threads explicitly trigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it sounds like that it is not worth the troubles to enable AVX512 for OSX x64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly possible. I think we do need some comment tracking/explaining the general reason why its not supported however so the only other change I could see is removing the commented out code.

Copy link
Member Author

@tannergooding tannergooding Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove the commented out code. I still left the general TODO, but its likely not something we'll actively focus on supporting given the general move of OSX to Arm64.

OSX may have similar requirements for SVE, however, so if the work eventually happens, it should still be possible to enable in the future.

@tannergooding
Copy link
Member Author

Finally all green, 🎉 OSX was a lot more problematic than expected 😅

@BruceForstall BruceForstall added the avx512 Related to the AVX-512 architecture label Mar 27, 2023
@@ -682,8 +755,34 @@ void CONTEXTToNativeContext(CONST CONTEXT *lpContext, native_context_t *native)
#if defined(HOST_AMD64) && defined(XSTATE_SUPPORTED)
if ((lpContext->ContextFlags & CONTEXT_XSTATE) == CONTEXT_XSTATE)
{
_ASSERTE(FPREG_HasYmmRegisters(native));
memcpy_s(FPREG_Xstate_Ymmh(native), sizeof(M128A) * 16, lpContext->VectorRegister, sizeof(M128A) * 16);
if (FPREG_HasYmmRegisters(native))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the case when XSTATE is enabled and YMM registers were not present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the current implementation perspective, it shouldn't happen.

From the hardware and correctness perspective, xstate carries a lot of information and the YMM area is considered part of the optional extended state. It's safer and overall better to be consistent in our checks, IMO (particularly when the check is cheap compared to the overall thread suspend/restore logic).

I think if we want this to be performant, then we'd be better off focusing on transitioning the various logic to use the "native" suspend/restore mechanism for a given platform directly.

@tannergooding
Copy link
Member Author

Logged #83983 to track the additional MacOS save/restore support.

It's possible the reason this was still failing is due to a bad interaction with our own hardware exception handler logic so we'll do some minimal additional investigation to see if that is the case and if AVX-512 can be enabled for MacOS.

@janvorli, anything else needed here or is this good to be merged now?

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@tannergooding
Copy link
Member Author

Thanks!

@tannergooding tannergooding merged commit d612a50 into dotnet:main Mar 27, 2023
@tannergooding tannergooding deleted the evex-unix branch March 27, 2023 17:42
@ghost ghost locked as resolved and limited conversation to collaborators Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-PAL-coreclr avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Linux/Mac Context State for AVX512
6 participants