Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the CPUID and XSAVE logics for APX #104637

Merged
merged 18 commits into from
Nov 18, 2024

Conversation

Ruihan-Yin
Copy link
Contributor

@Ruihan-Yin Ruihan-Yin commented Jul 9, 2024

Previously #103019

Overview on the changes:

  1. Infrastructural changes on XArchIntrinsicConstants: Compress all the Avx512 related flags into 1 - Avx512f+bw+cd+dq+vl to Avx512, this saves more space in XArchIntrinsicConstants so that we can hold more x86 ISAs, like here, APX.
  1. CPUID check updates for APX: CR4[XSAVE](existing) -> XCR0[APX_F] -> CPUID(7,1).EDX[APX_F] - the current status is that due to the missing macro definition for APX on the OS level, the second check will fail anyways, and it may break the build on CI (to be verified).
  • Edit: made some workaround for this part in the code.
  1. APX introduced 16 EGPRs, which are XSTATE enabled, this PR provides the corresponding changes for XSAVE behaviors.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 9, 2024
@Ruihan-Yin
Copy link
Contributor Author

Hi @tannergooding,

I reopened the PR for APX CPUID updates here, I will resolve the conflict on guid, let CI start soon and fix fails popping up.

I understand this PR is targeting on the next release cycle, say .Net 10, so when your schedule allows, I wonder if you could review this PR for the first round, thanks!

@Ruihan-Yin
Copy link
Contributor Author

resolved the conflict.

@am11 am11 added area-PAL-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 19, 2024
@Ruihan-Yin Ruihan-Yin changed the title [NO REVIEW] Update the CPUID and XSAVE logics for APX [NO MERGE] Update the CPUID and XSAVE logics for APX Jul 19, 2024
@Ruihan-Yin
Copy link
Contributor Author

The REX2 enabling PR (#106557) is there, that PR is based on this one, since now main is accepting .net 10 changes, I will rebase the changes to latest main and continue the works.

@Ruihan-Yin
Copy link
Contributor Author

resolved conflicts with main

@Ruihan-Yin Ruihan-Yin changed the title [NO MERGE] Update the CPUID and XSAVE logics for APX Update the CPUID and XSAVE logics for APX Aug 28, 2024
@Ruihan-Yin
Copy link
Contributor Author

@tannergooding This PR is ready for review.

Build failures are related to the changes in https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/arch/amd64/context2.S

the native compiler seems to not recognize EGPRs, e.g. r16 as a legal operand. I'm not sure how we can resolve this or if this part is needed for now. And for the changes to accommodate the XSTATE changes for EGPRs, I would be very willing to taking some suggestions from the high level, there might be some changes missing or not reasonable at all. it will be much appreciated if some advice can be shared.

@Ruihan-Yin Ruihan-Yin marked this pull request as ready for review August 29, 2024 20:33
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need to update the CONTEXT& CONTEXT::operator=(const CONTEXT& ctx) to properly copy the new registers if the XStateFeaturesMask has a mask for this feature. And also make sure that when it is not set, we copy only the necessary part of the context.

@@ -183,6 +183,29 @@ LOCAL_LABEL(Done_Restore_CONTEXT_FLOATING_POINT):
kmovq k6, qword ptr [rdi + (CONTEXT_KMask0 + 6 * 8)]
kmovq k7, qword ptr [rdi + (CONTEXT_KMask0 + 7 * 8)]

// TODO-xarch-apx: the definition of XSTATE mask value for APX is now missing on the OS level,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Unix only code, so the XSTATE mask is not coming from the OS headers, but rather pal.h for C/C++ code and src/coreclr/pal/src/arch/amd64/asmconstants.h for asm. This comment can be removed.

// TODO-xarch-apx: the definition of XSTATE mask value for APX is now missing on the OS level,
// we are currently using bare value to hack it through the build process, and test the implementation through CI.
// those changes will be removed when we have the OS support for APX.
test BYTE PTR [rdi + CONTEXT_XStateFeaturesMask], 524288
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add symbolic definition for the mask to src/coreclr/pal/src/arch/amd64/asmconstants.h next to the other XSTATE_xxx definitions?

src/coreclr/vm/threadsuspend.cpp Show resolved Hide resolved
@Ruihan-Yin
Copy link
Contributor Author

linux and arm related build failures originate from the ASM code that uses EGPRs in the MOV instructions, while now the compiler (presumably LLVM) does not accept EGPRs as operand. This might be able to be fixed by updating the dependent version of LLVM.

Did you fix this issue?

We're using clang 18.1.8 to build (in azure Linux containers): dotnet/dotnet-buildtools-prereqs-docker#1207). What version has the APX features required?

We've communicated with our LLVM folks and were told LLVM-19 should have APX support, but eventually we will need LLVM-20 to accommodate the Avx10.2 codegen changes.

Do you have any preference how we can make this through? we are okay with upgrading to LLVM-19 this time and LLVM-20 later, or upgrade once altogether. (LLVM-20 release should be out early next year.)

@@ -91,7 +93,8 @@
#define CONTEXT_KMask0 CONTEXT_Ymm0H+(16*16)
#define CONTEXT_Zmm0H CONTEXT_KMask0+(8*8)
#define CONTEXT_Zmm16 CONTEXT_Zmm0H+(32*16)
#define CONTEXT_Size CONTEXT_Zmm16+(64*16)
#define CONTEXT_Egpr CONTEXT_Zmm16+(16*8)
#define CONTEXT_Size CONTEXT_Egpr+(64*16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. It seems it should be

#define CONTEXT_Egpr CONTEXT_Zmm16+(64*16)
#define CONTEXT_Size CONTEXT_Egpr+(16*8)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment is still pending.

I'd agree with the janvorli's assessment here. Each new define is previousDefine+(SizeOfPreviousDefine) and _Zmm16 should have a size of 64*16, while _Egpr should have a size of 8*16 (typically SizeOfPreviousDefine is built up as SizePerItem*NumberOfItems, so 64*16 because each register is 64-bytes and there are 16 of them, similarly 8*16 because each register is 8-bytes and there are 16 of them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out and the explanation!

sorry for the delay, we had a holiday yesterday.

@@ -22,6 +22,8 @@

; DO NOT CHANGE R2R NUMERIC VALUES OF THE EXISTING SETS. Changing R2R numeric values definitions would be R2R format breaking change.

; The ISA definiitons should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit:

Suggested change
; The ISA definiitons should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp.
; The ISA definitions should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp.

@@ -125,6 +155,19 @@ static uint32_t avx512StateSupport()
return ((_xgetbv(0) & 0xE6) == 0x0E6) ? 1 : 0;
}

#ifndef XSTATE_MASK_APX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already defined in this file at line 75

Copy link
Contributor Author

@Ruihan-Yin Ruihan-Yin Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 75 is the definition under UNIX (Avx512 seems to adopt the design that have these mask only explicitly defined under UNIX, so I followed it for APX). And under windows, I suppose the definition should be from winnt.h, I'm not quite familiar with how PAL works with each part. If so, we will need this definition here until OS supports APX and defines these masks in winnt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've missed the fact that one definition was for Windows and the other for Unix. So we can keep it as is until the symbol makes it to the windows SDK and then remove the Windows defines.

@@ -313,6 +313,16 @@ typedef int __ptrace_request;
ASSIGN_CONTROL_REGS \
ASSIGN_INTEGER_REGS \

#if defined(HOST_AMD64) && defined(XSTATE_SUPPORTED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already defined in pal.h, it should be visible here.

@@ -809,6 +872,18 @@ void CONTEXTToNativeContext(CONST CONTEXT *lpContext, native_context_t *native)
dest = FPREG_Xstate_Hi16Zmm(native, &size);
_ASSERT(size == (sizeof(M512) * 16));
memcpy_s(dest, sizeof(M512) * 16, &lpContext->Zmm16, sizeof(M512) * 16);

#ifndef TARGET_OSX
// TODO-xarch-apx: I suppose OSX will not support APX.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be dependent on whether or not Rosetta gets support, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal educated guess is that Rosetta will never get support for APX (like it never gets support for AVX). The goal of Rosetta is to run apps developed for intel based macOS devices. Since those are not being created anymore, emulating APX doesn't seem to make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it never gets support for AVX

I've seen some unofficial reports that they added it in macOS 15.

@@ -238,8 +238,8 @@ ASMCONSTANTS_C_ASSERT(OFFSETOF__VASigCookie__pNDirectILStub

#if defined(UNIX_AMD64_ABI) && !defined(HOST_WINDOWS)
// Expression is too complicated, is currently:
// (8*6 + 4*2 + 2*6 + 4 + 8*6 + 8*16 + 8 + /*XMM_SAVE_AREA32*/(2*2 + 1*2 + 2 + 4 + 2*2 + 4 + 2*2 + 4*2 + 16*8 + 16*16 + 1*96) + 26*16 + 8 + 8*5 + /*XSTATE*/ + 8 + 8 + /*XSTATE_AVX*/ 16*16 + /*XSTATE_AVX512_KMASK*/ 8*8 + /*XSTATE_AVX512_ZMM_H*/ 32*16 + /*XSTATE_AVX512_ZMM*/ 64*16)
#define SIZEOF__CONTEXT (3104)
// (8*6 + 4*2 + 2*6 + 4 + 8*6 + 8*16 + 8 + /*XMM_SAVE_AREA32*/(2*2 + 1*2 + 2 + 4 + 2*2 + 4 + 2*2 + 4*2 + 16*8 + 16*16 + 1*96) + 26*16 + 8 + 8*5 + /*XSTATE*/ + 8 + 8 + /*XSTATE_AVX*/ 16*16 + /*XSTATE_AVX512_KMASK*/ 8*8 + /*XSTATE_AVX512_ZMM_H*/ 32*16 + /*XSTATE_AVX512_ZMM*/ 64*16 + /*XSTATE_APX*/ 8*16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is getting fairly long. I wonder if we can separate it by line instead to make it more readable, something like this (assuming I broke it up correctly):

// 8*6 + 4*2 + 2*6 + 4 + 8*6 + 8*16 + 8 +
// /*XMM_SAVE_AREA32*/ (2*2 + 1*2 + 2 + 4 + 2*2 + 4 + 2*2 + 4*2 + 16*8 + 16*16 + 1*96) + 26*16 + 8 + 8*5 +
// /*XSTATE*/ 8 + 8 +
// /*XSTATE_AVX*/ 16*16 +
// /*XSTATE_AVX512_KMASK*/ 8*8 +
// /*XSTATE_AVX512_ZMM_H*/ 32*16 +
// /*XSTATE_AVX512_ZMM*/ 64*16 +
// /*XSTATE_APX*/ 8*16

@tannergooding
Copy link
Member

Changes look overall good/correct to me. There's a couple pending cleanup requests from janvorli that should be resolved before sign-off happens

@Ruihan-Yin
Copy link
Contributor Author

CI fails look known or unrelated.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli
Copy link
Member

Note: we may get some perf regressions in the performance microbenchmarks due to the larger context struct size.

@tannergooding
Copy link
Member

Updated the branch to retrigger CI and see if the unrelated Mono WASM issues are resolved.

If not, I'll get issues opened and get this merged once CI finishes again.

@DeepakRajendrakumaran
Copy link
Contributor

Updated the branch to retrigger CI and see if the unrelated Mono WASM issues are resolved.

If not, I'll get issues opened and get this merged once CI finishes again.

Thanks Tanner. Looking forward to having this in. Would make running superpmi for APX way easier :)

Comment on lines +189 to +191
// TODO-XArch-APX:
// we are using raw hex code here to emit EGPRs-related changes,
// we will need to come back and re-write this part when assembler supports EGPRs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to do an #if to only use the raw hex on assemblers without support today?

Copy link
Contributor Author

@Ruihan-Yin Ruihan-Yin Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of, do we have sth like LLVM version identifier defined in our code base?

I think this chunk is meant to be converted to MOV instructions once we have LLVM with official APX supports, and the code may not have actual use cases until we have machines with APX.

@tannergooding, could you share some thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine as is, I don't think its worth supporting code that we can't reliably test and maintain in CI at this point in time.

Rather, lets just have the one path and update when the toolchain can finally be updated to LLVM 19/20 or later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this too.

@JulieLeeMSFT
Copy link
Member

@BruceForstall, please review this PR. It’s ready to merge.

@tannergooding
Copy link
Member

I have issues opened for the unrelated chrome-DebuggerTests failures. One of the failures requires the tests to be disabled first, however, so I'm waiting on #109871 to be merged before this goes in.

@tannergooding
Copy link
Member

/ba-g Unrelated failure in chrome-DebuggerTests that can't be handled via known build failure due to crashing with DeadLetter

@tannergooding tannergooding merged commit e1aa9cf into dotnet:main Nov 18, 2024
161 of 164 checks passed
@Ruihan-Yin
Copy link
Contributor Author

Thanks all for the help and reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.