Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the handling of SVE state as part of threadsuspend #105059

Merged
merged 4 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ REDHAWK_PALEXPORT CONTEXT* PalAllocateCompleteOSContext(_Out_ uint8_t** contextB
{
CONTEXT* pOSContext = NULL;

#if (defined(TARGET_X86) || defined(TARGET_AMD64))
#if defined(TARGET_X86) || defined(TARGET_AMD64) || defined(TARGET_ARM64)
DWORD context = CONTEXT_COMPLETE;

if (pfnInitializeContext2 == NULL)
Expand All @@ -503,10 +503,17 @@ REDHAWK_PALEXPORT CONTEXT* PalAllocateCompleteOSContext(_Out_ uint8_t** contextB
}
#endif //TARGET_X86

// Determine if the processor supports AVX or AVX512 so we could
// retrieve extended registers
#if defined(TARGET_X86) || defined(TARGET_AMD64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_AVX | XSTATE_MASK_AVX512;
const ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | XSTATE_MASK_MPX | xStateFeatureMask;
#elif defined(TARGET_ARM64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_ARM64_SVE;
const ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | xStateFeatureMask;
#endif

// Determine if the processor supports extended features so we could retrieve those registers
DWORD64 FeatureMask = GetEnabledXStateFeatures();
janvorli marked this conversation as resolved.
Show resolved Hide resolved
if ((FeatureMask & (XSTATE_MASK_AVX | XSTATE_MASK_AVX512)) != 0)
if ((FeatureMask & xStateFeatureMask) != 0)
{
context = context | CONTEXT_XSTATE;
}
Expand All @@ -517,7 +524,6 @@ REDHAWK_PALEXPORT CONTEXT* PalAllocateCompleteOSContext(_Out_ uint8_t** contextB

// Retrieve contextSize by passing NULL for Buffer
DWORD contextSize = 0;
ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | XSTATE_MASK_AVX | XSTATE_MASK_MPX | XSTATE_MASK_AVX512;
// The initialize call should fail but return contextSize
BOOL success = pfnInitializeContext2 ?
pfnInitializeContext2(NULL, context, NULL, &contextSize, xStateCompactionMask) :
Expand Down Expand Up @@ -565,15 +571,21 @@ REDHAWK_PALEXPORT _Success_(return) bool REDHAWK_PALAPI PalGetCompleteThreadCont
{
_ASSERTE((pCtx->ContextFlags & CONTEXT_COMPLETE) == CONTEXT_COMPLETE);

#if defined(TARGET_X86) || defined(TARGET_AMD64)
// Make sure that AVX feature mask is set, if supported. This should not normally fail.
// This should not normally fail.
// The system silently ignores any feature specified in the FeatureMask which is not enabled on the processor.
#if defined(TARGET_X86) || defined(TARGET_AMD64)
if (!SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX | XSTATE_MASK_AVX512))
{
_ASSERTE(!"Could not apply XSTATE_MASK_AVX | XSTATE_MASK_AVX512");
return FALSE;
}
#endif //defined(TARGET_X86) || defined(TARGET_AMD64)
#elif defined(TARGET_ARM64)
if (!SetXStateFeaturesMask(pCtx, XSTATE_MASK_ARM64_SVE))
{
_ASSERTE(!"Could not apply XSTATE_MASK_ARM64_SVE");
return FALSE;
}
#endif

return GetThreadContext(hThread, pCtx);
}
Expand Down Expand Up @@ -902,7 +914,7 @@ REDHAWK_PALEXPORT HANDLE PalLoadLibrary(const char* moduleName)
return 0;
}
moduleNameWide[len] = '\0';

HANDLE result = LoadLibraryExW(moduleNameWide, NULL, LOAD_WITH_ALTERED_SEARCH_PATH);
delete[] moduleNameWide;
return result;
Expand Down
5 changes: 2 additions & 3 deletions src/coreclr/pal/inc/pal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1846,9 +1846,8 @@ typedef struct _IMAGE_ARM_RUNTIME_FUNCTION_ENTRY {

#define CONTEXT_XSTATE (CONTEXT_ARM64 | 0x40L)

#define XSTATE_SVE (0)

#define XSTATE_MASK_SVE (UI64(1) << (XSTATE_SVE))
#define XSTATE_ARM64_SVE (2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might need to be an additional set of changes in vm/arm64/asmconstants.h as there is currently a SIZEOF__CONTEXT struct that exists and which is meant to represent the size of the struct on Win32 vs UNIX and I believe it is currently incorrect for Unix (or may need additional handling to ensure its large enough to hold the additional SVE state based on the vector length).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be related to the Windows handling being added, however, so I've left it to a future PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how it could be incorrect when there is an assert that it matches the sizeof (T_CONTEXT) in the asmconstants.h.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consideration is that SVE registers don’t have a fixed width and so the current logic asserting the size statically is already wrong (it’s assuming SVE registers are exactly 128-bits, when they can be up to 2048 bits)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but the constant is to represent the size of the CONTEXT structure and that's fixed for now, so it is correct. I believe that when we'll be adding support for longer SVE registers, we should finally move to the Windows way of separate context and extended context, so the SIZEOF__CONTEXT would still be constant - and it will actually match Windows one after that.

#define XSTATE_MASK_ARM64_SVE (UI64(1) << (XSTATE_ARM64_SVE))

//
// This flag is set by the unwinder if it has unwound to a call
Expand Down
6 changes: 2 additions & 4 deletions src/coreclr/pal/src/arch/arm64/asmconstants.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,8 @@
#define CONTEXT_XSTATE_BIT (6)
#define CONTEXT_XSTATE (1 << CONTEXT_XSTATE_BIT)

#define XSTATE_SVE_BIT (0)

#define XSTATE_MASK_SVE (UI64(1) << (XSTATE_SVE))

#define XSTATE_ARM64_SVE_BIT (2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general CONTEXT struct as is currently used by PAL is supposed to mirror the win32 layout/defines so they can translate 1-to-1

The win32 naming convention here is then XSTATE_ARM64_SVE and it is using bit 2, not bit 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The win32 naming convention here is then XSTATE_ARM64_SVE and it is using bit 2, not bit 1

So wrong bit was set for linux in #103801 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

I don't think it would technically cause problems because we're just emulating the CONTEXT struct and it doesn't actually have to match the Win32 struct 1-to-1 (since its only going to be other pal APIs consuming it)

But it's overall better/simpler to ensure they match so we don't end up with potential conflicts or other issues.

The best option here would be to just consume the native context_t struct from Unix directly (and not do any of this shimming), but that's a more involved PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't actually have to match the Win32 struct 1-to-1

In fact, it does not match the Win32 struct 1-to-1: #103801 (comment)

The best option here would be to just consume the native context_t struct from Unix directly (and not do any of this shimming), but that's a more involved PR.

+1. The Windows CONTEXT structure is very entrenched throughout CoreCLR VM.

#define XSTATE_MASK_ARM64_SVE (UI64(1) << (XSTATE_ARM64_SVE_BIT))

#define CONTEXT_ContextFlags 0
#define CONTEXT_Cpsr CONTEXT_ContextFlags+4
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/pal/src/arch/arm64/context2.S
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ LOCAL_LABEL(Done_CONTEXT_FLOATING_POINT):
b.ne LOCAL_LABEL(Done_CONTEXT_SVE)

ldr x1, [x0, CONTEXT_XSTATEFEATURESMASK_OFFSET]
tbz x1, #XSTATE_SVE_BIT, LOCAL_LABEL(Done_CONTEXT_SVE)
tbz x1, #XSTATE_ARM64_SVE_BIT, LOCAL_LABEL(Done_CONTEXT_SVE)

add x0, x0, CONTEXT_SVE_OFFSET
str p0, [x0, CONTEXT_P0_VL, MUL VL]
Expand Down Expand Up @@ -195,7 +195,7 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT):
tbz w17, #CONTEXT_XSTATE_BIT, LOCAL_LABEL(No_Restore_CONTEXT_SVE)

ldr w17, [x16, CONTEXT_XSTATEFEATURESMASK_OFFSET]
tbz w17, #XSTATE_SVE_BIT, LOCAL_LABEL(No_Restore_CONTEXT_SVE)
tbz w17, #XSTATE_ARM64_SVE_BIT, LOCAL_LABEL(No_Restore_CONTEXT_SVE)

add x16, x16, CONTEXT_SVE_OFFSET
ldr p0, [x16, CONTEXT_FFR_VL, MUL VL]
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/pal/src/thread/context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -817,7 +817,7 @@ void CONTEXTToNativeContext(CONST CONTEXT *lpContext, native_context_t *native)
//TODO-SVE: This only handles vector lengths of 128bits.
if (CONTEXT_GetSveLengthFromOS() == 16)
{
_ASSERT((lpContext->XStateFeaturesMask & XSTATE_MASK_SVE) == XSTATE_MASK_SVE);
_ASSERT((lpContext->XStateFeaturesMask & XSTATE_MASK_ARM64_SVE) == XSTATE_MASK_ARM64_SVE);

uint16_t vq = sve_vq_from_vl(lpContext->Vl);

Expand Down Expand Up @@ -1169,7 +1169,7 @@ void CONTEXTFromNativeContext(const native_context_t *native, LPCONTEXT lpContex

uint16_t vq = sve_vq_from_vl(sve->vl);

lpContext->XStateFeaturesMask |= XSTATE_MASK_SVE;
lpContext->XStateFeaturesMask |= XSTATE_MASK_ARM64_SVE;

//Note: Size of ffr register is SVE_SIG_FFR_SIZE(vq) bytes.
lpContext->Ffr = *(WORD*) (((uint8_t*)sve) + SVE_SIG_FFR_OFFSET(vq));
Expand Down
57 changes: 40 additions & 17 deletions src/coreclr/vm/threadsuspend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,17 @@ ThreadSuspend::SUSPEND_REASON ThreadSuspend::m_suspendReason;

#if defined(TARGET_WINDOWS)
void* ThreadSuspend::g_returnAddressHijackTarget = NULL;
#endif
#endif // TARGET_WINDOWS

// Mirror the XSTATE_ARM64_SVE flags from winnt.h

#ifndef XSTATE_ARM64_SVE
#define XSTATE_ARM64_SVE (2)
#endif // XSTATE_ARM64_SVE

#ifndef XSTATE_MASK_ARM64_SVE
#define XSTATE_MASK_ARM64_SVE (1ui64 << (XSTATE_ARM64_SVE))
#endif // XSTATE_MASK_ARM64_SVE

// If you add any thread redirection function, make sure the debugger can 1) recognize the redirection
// function, and 2) retrieve the original CONTEXT. See code:Debugger.InitializeHijackFunctionAddress and
Expand Down Expand Up @@ -1956,20 +1966,26 @@ CONTEXT* AllocateOSContextHelper(BYTE** contextBuffer)
{
CONTEXT* pOSContext = NULL;

#if !defined(TARGET_UNIX) && (defined(TARGET_X86) || defined(TARGET_AMD64))
#if !defined(TARGET_UNIX) && (defined(TARGET_X86) || defined(TARGET_AMD64) || defined(TARGET_ARM64))
DWORD context = CONTEXT_COMPLETE;

// Determine if the processor supports AVX so we could
// retrieve extended registers
#if defined(TARGET_X86) || defined(TARGET_AMD64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_AVX | XSTATE_MASK_AVX512;
const ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | XSTATE_MASK_MPX | xStateFeatureMask;
#elif defined(TARGET_ARM64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_ARM64_SVE;
const ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | xStateFeatureMask;
#endif

// Determine if the processor supports extended features so we could retrieve those registers
DWORD64 FeatureMask = GetEnabledXStateFeatures();
if ((FeatureMask & (XSTATE_MASK_AVX | XSTATE_MASK_AVX512)) != 0)
if ((FeatureMask & xStateFeatureMask) != 0)
{
context = context | CONTEXT_XSTATE;
}

// Retrieve contextSize by passing NULL for Buffer
DWORD contextSize = 0;
ULONG64 xStateCompactionMask = XSTATE_MASK_LEGACY | XSTATE_MASK_AVX | XSTATE_MASK_MPX | XSTATE_MASK_AVX512;
// The initialize call should fail but return contextSize
BOOL success = g_pfnInitializeContext2 ?
g_pfnInitializeContext2(NULL, context, NULL, &contextSize, xStateCompactionMask) :
Expand Down Expand Up @@ -2005,7 +2021,6 @@ CONTEXT* AllocateOSContextHelper(BYTE** contextBuffer)
}

*contextBuffer = buffer;

#else
pOSContext = new (nothrow) CONTEXT;
pOSContext->ContextFlags = CONTEXT_COMPLETE;
Expand Down Expand Up @@ -2896,17 +2911,17 @@ BOOL Thread::RedirectThreadAtHandledJITCase(PFN_REDIRECTTARGET pTgt)

// Always get complete context, pCtx->ContextFlags are set during Initialization

#if defined(TARGET_X86) || defined(TARGET_AMD64)
// Scenarios like GC stress may indirectly disable XState features in the pCtx
// depending on the state at the time of GC stress interrupt.
//
// Make sure that AVX feature mask is set, if supported.
//
// This should not normally fail.
// The system silently ignores any feature specified in the FeatureMask
// which is not enabled on the processor.
SetXStateFeaturesMask(pCtx, (XSTATE_MASK_AVX | XSTATE_MASK_AVX512));
#endif //defined(TARGET_X86) || defined(TARGET_AMD64)
#if defined(TARGET_X86) || defined(TARGET_AMD64)
SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX | XSTATE_MASK_AVX512);
#elif defined(TARGET_ARM64)
SetXStateFeaturesMask(pCtx, XSTATE_MASK_ARM64_SVE);
#endif

// Make sure we specify CONTEXT_EXCEPTION_REQUEST to detect "trap frame reporting".
pCtx->ContextFlags |= CONTEXT_EXCEPTION_REQUEST;
Expand Down Expand Up @@ -3026,22 +3041,30 @@ BOOL Thread::RedirectCurrentThreadAtHandledJITCase(PFN_REDIRECTTARGET pTgt, CONT
// Get and save the thread's context
BOOL success = TRUE;

#if defined(TARGET_X86) || defined(TARGET_AMD64)
#if defined(TARGET_X86) || defined(TARGET_AMD64) || defined(TARGET_ARM64)
// This method is called for GC stress interrupts in managed code.
// The current context may have various XState features, depending on what is used/dirty,
// but only AVX feature may contain live data. (that could change with new features in JIT)
// but only some features may contain live data. (that could change with new features in JIT)
// Besides pCtx may not have space to store other features.
// So we will mask out everything but AVX.
// So we will mask out everything but those we are known to use.
DWORD64 srcFeatures = 0;
success = GetXStateFeaturesMask(pCurrentThreadCtx, &srcFeatures);

_ASSERTE(success);
if (!success)
return FALSE;

// Get may return 0 if no XState is set, which Set would not accept.
if (srcFeatures != 0)
{
success = SetXStateFeaturesMask(pCurrentThreadCtx, srcFeatures & (XSTATE_MASK_AVX | XSTATE_MASK_AVX512));
#if defined(TARGET_X86) || defined(TARGET_AMD64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_AVX | XSTATE_MASK_AVX512;
#elif defined(TARGET_ARM64)
const DWORD64 xStateFeatureMask = XSTATE_MASK_ARM64_SVE;
#endif

success = SetXStateFeaturesMask(pCurrentThreadCtx, srcFeatures & xStateFeatureMask);

_ASSERTE(success);
if (!success)
return FALSE;
Expand Down Expand Up @@ -5850,7 +5873,7 @@ void Thread::ApcActivationCallback(ULONG_PTR Parameter)

#if defined(TARGET_ARM64)
// Windows incorrectly set the CONTEXT_UNWOUND_TO_CALL in the flags of the context it passes to us.
// That results in incorrect compensation of PC at some places and sometimes incorrect unwinding
// That results in incorrect compensation of PC at some places and sometimes incorrect unwinding
// and GC holes due to that.
pContext->ContextFlags &= ~CONTEXT_UNWOUND_TO_CALL;
#endif // TARGET_ARM64
Expand Down
Loading