-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: Add SVE registers to pal context #103801
Conversation
src/coreclr/pal/inc/pal.h
Outdated
// | ||
// Sve Registers | ||
// | ||
//TODO-SVE: How does this structure handle variable sized Z/P/FFR registers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI, this should match the same structure in Windows. I don't have the any documentation, so I've made a guess at what the fields should be for SVE, and I expect that it's wrong
For convenience I've only used a vector length 128bits. I'd be surprised if windows supports a full 2048bit vector length without doing anything special.
(Offsets below marked with a ?
I'll fix once the structure is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows in general don't store extended context parts in the CONTEXT data structure itself. There is a flag CONTEXT_XSTATE that indicates presence of extra data attached to the CONTEXT. There are APIs InitializeContext and InitializeContext2 that allows setting up a context for the extended state. It can be also used to get the size of memory needed for the extended context. The InitializeContext2 is a new one that allows to select only a subset of the extended state using the XStateCompactionMask argument.
We have done this differently for AVX512 for the sake of simplicity - we have included the extra registers in the CONTEXT structure itself. I think it would be better to move that to the way Windows handle that so that we don't waste time initializing and copying extra fields at places where we don't care about the extended state or when the current CPU doesn't support them. That would also allow to size the storage for the Z/P registers dynamically based on the current CPU.
Having said that though, for this PR, we can follow the suite and do the same thing we did for intel avx512 and migrate both to the better model later. Based on what @kunalspathak told me, starting with 128 bits of space for the registers should be sufficient for now.
I would add them to the very end of the CONTEXT after the debug registers so that the layout of the part that's common with Windows is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, for the OS, extended context like SVE and AVX etc are stored in a variable-sized buffer separate from the CONTEXT. The CONTEXT_EX structure immediately follows the CONTEXT structure, and contains pointers to the variable-sized XSTATE buffer. On x64, the XSTATE buffer is in the exact format that is supported by the hardware via the XSAVE and XRSTOR instructions. On ARM64, there are no XSAVE/XRSTOR instructions, but the XSTATE buffer is laid out in a similar fashion to x64 (including Header->Mask, Header->CompationMask etc), to allow for max code sharing with x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OS kernel does support all SVE vector lengths, up to 2048-bit SVE, though there is the caveat that HyperV only supports 128-bit SVE. So, when running on hardware that supports SVE larger than 128-bits, if HyperV is enabled you'll only see 128-bit SVE, but if HyperV is off then you'll be able to take advantage of the full SVE width supported by the CPU. And to my understanding, there is likely hardware in the future that supports larger SVE lengths than 128-bit, though I don't know any specific on timelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. Updated with the following:
- Added context2.S changes
- Removed store/restore of Z registers (as we only support 128bits for now, which fully overlap the V registers)
- Added a XStateFeaturesMask to the Arm64 context so that we can tell whether to use SVE or not.
I'm currently unsure where else SVE state might need saving/restoring
Things I'm unsure about:
|
Build failures on Windows, but I expected that as that all still needs doing. Marking as ready as I could do with comments, especially on the Windows side. @dotnet/arm64-contrib @kunalspathak @tannergooding |
Change-Id: I2f6bc39068d9fed3f45b548089b144884607d97b
@@ -718,6 +720,41 @@ void CONTEXTToNativeContext(CONST CONTEXT *lpContext, native_context_t *native) | |||
*(NEON128*) &fp->vregs[i] = lpContext->V[i]; | |||
} | |||
} | |||
|
|||
if (sve) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to x64, we should copy the state only if the contextFlags has the CONTEXT_XSTATE flag set. The passed in contextFlags list parts of the state that are valid that the caller is interested in.
It seems it would make sense to move this to the end of the function next to where we extract xstate for amd64 and put it under the same if ((contextFlags & CONTEXT_XSTATE) == CONTEXT_XSTATE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to x64, we should copy the state only if the contextFlags has the CONTEXT_XSTATE flag set.
There is one remaining test failure I've just debugged to being due to this. Will fix it up.
@@ -104,6 +105,32 @@ LOCAL_LABEL(Done_CONTEXT_INTEGER): | |||
sub x0, x0, CONTEXT_FLOAT_CONTROL_OFFSET + CONTEXT_NEON_OFFSET | |||
|
|||
LOCAL_LABEL(Done_CONTEXT_FLOATING_POINT): | |||
ldr x1, [x0, CONTEXT_XSTATEFEATURESMASK_OFFSET] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should check the CONTEXT_XSTATE in CONTEXT_ContextFlags first and check the features mask only if the CONTEXT_XSTATE is set.
@@ -154,6 +184,31 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT): | |||
// since we potentially clobber x0 below, we'll bank it in x16 | |||
mov x16, x0 | |||
|
|||
ldr w17, [x16, CONTEXT_XSTATEFEATURESMASK_OFFSET] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should check the CONTEXT_XSTATE in CONTEXT_ContextFlags first and check the features mask only if the CONTEXT_XSTATE is set.
@@ -60,6 +60,90 @@ using asm_sigcontext::_xstate; | |||
bool Xstate_IsAvx512Supported(); | |||
#endif // XSTATE_SUPPORTED || (HOST_AMD64 && HAVE_MACH_EXCEPTIONS) | |||
|
|||
#if defined(HOST_64BIT) && defined(HOST_ARM64) && !defined(TARGET_FREEBSD) && !defined(TARGET_OSX) | |||
#if !defined(SVE_MAGIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this define not present when building in our CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this define not present when building in our CI?
Yes, they are missing in the CI. If I remove any of them the build falls over. I think this is when cross compiling. Eithe way, it must be an old Linux being used because these defines have been present in Linux since about 2017.
Besides the few comments, it looks good. |
Fixed up so that XSTATE is set and checked as suggested. This PR enables |
Running all priority 1 tests in checked on SVE Linux....
That single failure I get on latest head, so I'm not worried about it. |
I see a lot of TODOs about SVE size being hardcoded to 128 bit. What is going to be the experience when somebody runs .NET 9 binary on a machine with 256 bit SVE? It is important that it just works, without crashing, buffer overruns, etc. |
Running the entire testsuite on 256bit, all the tests pass with and without my latest fix. That's because there is no SVE state in the kernel, so that structure that comes back from the OS has no SVE state (sve.size is 16, it's just the header with no data). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @a74nh for your contribution. LGTM.
/ba-g failure is #103550 |
It got broken by dotnet#103801 due to a host vs. target arch typo. This showed up in the VMR since we use arm64 macOS build agents there.
It got uncovered by dotnet#103801. This showed up in the VMR since we use arm64 macOS build agents there.
It got uncovered by #103801. This showed up in the VMR since we use arm64 macOS build agents there.
Adds Linux support for SVE state on signals.
Testing:
I forced a sigill (by making one of the hwintrinsic API calls generate a bad instruction). I checked the SVE registers when the signal occurred. I stepped through and made sure the
lpContext
is correctly filled.