-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] [APX] Enable additional General Purpose Registers. #108799
base: main
Are you sure you want to change the base?
[JIT] [APX] Enable additional General Purpose Registers. #108799
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
CC @jakobbotsch and @tannergooding for code review. |
@DeepakRajendrakumaran What is the status of this PR? It's marked as ready but the description says it's built on top of #108796 that is not marked as ready. |
Thanks for pointing that out. It has some dependencies on other PRs - specifically the Rex2 encoding PR. Considering that, do you have a suggestion on how to mark this for now? |
eb47ede
to
ec3388f
Compare
a3e8331
to
6bbccb4
Compare
Now that CPUID changes have merged, ran superpmi TP and I have a problem Ran the scripts shared by Kunal a while back to debug why this is happening The following is for libraries
|
Trying to further make sure the Rex2 changes are not causing TP regression. We can safely conclude the TP regression is from eGPR enablement The following is with/without Rex2 changes(without reg alloc changes)Overall (+0.08% to +0.23%)
MinOpts (+0.28% to +0.48%)
FullOpts (+0.08% to +0.14%)
With Rex2 as base and eGPR changes as diffOverall (+3.60% to +4.65%)
MinOpts (+6.09% to +8.79%)
FullOpts (+3.47% to +4.29%)
|
5466739
to
a9d71a2
Compare
b93e387
to
9905646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the first pass, need to evaluate where TP regression is coming from. However, I still see some asmdiffs...can you please fix it?
src/coreclr/jit/lsra.cpp
Outdated
@@ -12534,6 +12555,9 @@ void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation curr | |||
LinearScan::RegisterSelection::RegisterSelection(LinearScan* linearScan) | |||
{ | |||
this->linearScan = linearScan; | |||
#if defined(TARGET_AMD64) | |||
rbmAllInt = linearScan->compiler->get_RBM_ALLINT(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why we need it here instead of LinearScan
ctor (which you are already doing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
src/coreclr/jit/emit.h
Outdated
@@ -742,6 +743,7 @@ class emitter | |||
// The instrDescCGCA struct's member keeping the GC-ness of the first return register is _idcSecondRetRegGCType. | |||
GCtype _idGCref : 2; // GCref operand? (value is a "GCtype") | |||
|
|||
#if !defined(TARGET_AMD64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment - having _idReg1/_idReg2 here with increased size caused padding and increased size even more
src/coreclr/jit/regMaskTPOps.cpp
Outdated
@@ -62,7 +62,12 @@ bool regMaskTP::IsRegNumInMask(regNumber reg, var_types type) const | |||
// | |||
void regMaskTP::AddGprRegs(SingleTypeRegSet gprRegs) | |||
{ | |||
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. | |||
#if defined(TARGET_AMD64) | |||
assert((gprRegs == RBM_NONE) || ((gprRegs & RBM_ALLINT_STATIC_ALL) != RBM_NONE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for non-APX machines, gpr will still be 0-15
and with this assert, we will allow float register to get set, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for non-APX machines, gpr will still be
0-15
and with this assert, we will allow float register to get set, right?
Not really. On both APX and non-apx machines bits 0-23 will be eGPR and 24-55 SIMD. We just make sure that 16-23 are not used for non APX machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just make sure that 16-23 are not used for non APX machines
how are we making sure? worth adding some asserts.
|
||
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. Deprecated???? | ||
#if defined(TARGET_AMD64) | ||
sprintf_s(regmask, cchRegMask, REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need RBM_ALLINT_STATIC_ALL
here? it should just use RBM_ALLINT
and it should return the right mask depending on if high int registers are available or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RBM_ALLINT and it should return the right mask depending on if high int registers are available or not
- I'm not sure we can do that here. RBM_ALLINT
calls get_RBM_ALLINT()
. One way to make it work would be to move this method to part of compiler class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it...RBM_ALLINT_STATIC_ALL
should be the one we should be using and we can have it for both x86 and x64 for consistency.
Alternatively, if you decide to add rbmAllInt
on x86, we can just use RBM_ALLINT
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left the *STATIC_ALL
version in all asserts where relevant
https://github.com/dotnet/runtime/pull/108799/files#r1898770314
https://github.com/dotnet/runtime/pull/108799/files#r1898771126
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. These are used by GC | ||
// exclusively | ||
#if defined(TARGET_AMD64) | ||
printf(REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here...can just use RBM_ALLINT
?
@@ -3136,4 +3347,51 @@ inline SingleTypeRegSet LinearScan::BuildEvexIncompatibleMask(GenTree* tree) | |||
#endif | |||
} | |||
|
|||
inline bool LinearScan::DoesThisUseGPR(GenTree* op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add method docs for this and below method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method docs please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
src/coreclr/jit/lsraxarch.cpp
Outdated
return false; | ||
} | ||
|
||
inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree, bool forceLowGpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the goal of this method?
src/coreclr/jit/lsraxarch.cpp
Outdated
SingleTypeRegSet op1Candidates = candidates; | ||
SingleTypeRegSet op2Candidates = candidates; | ||
int srcCount = 0; | ||
// SingleTypeRegSet op1Candidates = candidates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are lot of such comments in this file. can you please delete them?
|
||
// We are dealing exclusively with HWIntrinsics here | ||
return (op->AsHWIntrinsic()->OperIsBroadcastScalar() || | ||
(op->AsHWIntrinsic()->OperIsMemoryLoad() && DoesThisUseGPR(op->AsHWIntrinsic()->Op(1)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we only care if Op(1)
uses GPR, not any other operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For xarch
, Op(1)
is the memory address for nodes satifying GenTreeHWIntrinsic::OperIsMemoryLoad
with the exception of 4 intrinsics(those 4 will not use this). And GPR is likely to be used only during mem addressing in these cases
src/coreclr/jit/lsraxarch.cpp
Outdated
else | ||
{ | ||
// ToDo-APX : imul currently doesn't have rex2 support. So, cannot use R16-R31. | ||
dstCandidates = BuildApxIncompatibleGPRMask(tree, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls to BuildApxIncompatibleGPRMask
for many nodes seems expensive. Wondering if we can do something like:
- at the top just set
SingleTypeRegSet incompatibleGprMask = compiler->canUseApxEncoding() ? lowGPRRegs() : RBM_NONE;
- Places where you are passing
forceLowGpr= true
can instead just useincompatibleGprMask
. - Places where you are not forcing lowGPr, can just use
DoesThisUseGPR(tree) ? incompatibleGprMask : RBM_NONE
Also, might worth caching the value of lowGPRRegs()
because currently it is evaluated every time to be (availableIntRegs & RBM_LOWINT.GetIntRegSet())
and I see lowGprRegs()
is used at lot of places.
It seems from your latest change, there are still asmdiffs coming up. I think there are places in vs. what CI is showing what does it show for you locally? |
That was happening because my environment was not setup correctly. I can now see the same TP diffs that is shown in CI. |
This reduced it by somewhere around 0.8%. Without that change for comparison - https://github.com/dotnet/runtime/pull/111004/checks?check_run_id=34998081643 |
Just a note that the TP regression we see here will impact not only non-APX machines but also AMD machines which do not have APX feature. We should add that consideration too while working on this on how we can reduce or have no impact on AMD. |
d6826bb
to
1a4014a
Compare
@kunalspathak @tannergooding I have made the required changes. Can you guys please review this now? |
1a4014a
to
673141a
Compare
src/coreclr/jit/codegencommon.cpp
Outdated
@@ -5725,7 +5732,11 @@ void CodeGen::genFnProlog() | |||
|
|||
if (initRegs) | |||
{ | |||
#ifdef TARGET_AMD64 | |||
for (regNumber reg = REG_INT_FIRST; reg <= REG_INT_LAST_APX_AWARE; reg = REG_NEXT(reg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also have get_REG_LAST_INT
on x86 and not have this #ifdef-else
? For x86, it will be just set to last REG INT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a common code path for all targets including arm. I can make it so that I expose get_REG_INT_LAST()
for all targets and just return REG_INT_LAST
for everything other than AMD64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like me to make the proposed change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed this.
src/coreclr/jit/codegenxarch.cpp
Outdated
#if defined(TARGET_AMD64) | ||
// TODO-Xarch-apx : Revert. Excluding eGPR so that it's not used for non REX2 supported movs. Revisit this one. | ||
// Might not be necessary. | ||
regNumber tmpReg = internalRegisters.GetSingle(tree, RBM_ALLINT_INIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we just use RBM_ALL_INT
here? it will expand to get_RBM_ALLINT()
which should give right set of registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since x86 doesnt have one, don't think the TP impact will be that much + it will make code little cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to test this out. The intent was not to use eGPR here. But I might eb able to work around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this. It is not needed
|
||
// RBM_ALLINT is not known at compile time on TARGET_AMD64 since it's dependent on APX support. Deprecated???? | ||
#if defined(TARGET_AMD64) | ||
sprintf_s(regmask, cchRegMask, REG_MASK_INT_FMT, (mask & RBM_ALLINT_STATIC_ALL).getLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it...RBM_ALLINT_STATIC_ALL
should be the one we should be using and we can have it for both x86 and x64 for consistency.
Alternatively, if you decide to add rbmAllInt
on x86, we can just use RBM_ALLINT
here.
// | ||
void regMaskTP::AddGprRegs(SingleTypeRegSet gprRegs) | ||
void regMaskTP::AddGprRegs(SingleTypeRegSet gprRegs, regMaskTP availableIntRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the callers are passing RBM_ALLINT
to this method, so perhaps we do not need it and can just use the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that 'RBM_ALLINT' uses get_RBM_ALLINT() and that's not available at compile time or from regMaskTP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this here because I for some reason cannot respond to the above comment.
rbmAllInt
can be accessed by methods which are part of classes which have rbmAllInt
. There is not really a way for global methods to access these. That's why I'm using RBM_ALLINT_STATIC_ALL
src/coreclr/jit/emitxarch.cpp
Outdated
regNumber AbsRegNumber(regNumber reg) | ||
{ | ||
assert(reg < REG_STK); | ||
if ((reg >= XMMBASE) && (reg < KBASE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can eliminate one condition by writing it:
if (reg >= KBASE)
{
return (regNumber)(reg - KBASE);
}
else if (reg >= XMMBASE)
{
return (regNumber)(reg - XMMBASE);
}
else
{
return reg;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -5910,7 +5935,12 @@ void emitter::emitIns_R(instruction ins, emitAttr attr, regNumber reg) | |||
noway_assert(emitVerifyEncodable(ins, size, reg)); | |||
|
|||
UNATIVE_OFFSET sz; | |||
instrDesc* id = emitNewInstrSmall(attr); | |||
instrDesc* id = emitNewInstrSmall(attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change can be reverted I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not add this. For some reason, jit format fails in CI and the patch created has this
@@ -37,6 +37,7 @@ | |||
DOTNET_EnableSSE41; | |||
DOTNET_EnableSSE42; | |||
DOTNET_EnableSSSE3; | |||
DOTNET_EnableAPX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is supposed to get used below in one of the pipeline. Is it intended to not do so at this point? If yes, then may be just remove it and add back when we enable the pipeline to test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I was not sure if needed. I added it since we provide a flag to turn it off for other similar features. I can remove it
// | ||
// Return Value: | ||
// updated register mask. | ||
inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you give a thought on #108799 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that originally. Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to make a small change here from the original implementation. It now takes in the current candidates and masks away the eGPRs
from the input current candidates when necessary
Functionally do you think it makes that much difference now? The method LinearScan::BuildApxIncompatibleGPRMask
itself is inlined.
These are the relevant cases I can think of
- Non
TARGET_AMD64
machines - it'll directly return thecandidates
. This shouldn't have any effect here - On
TARGET_AMD64
machines with APX not supported. Returns candidates after the first check. I might be able to not do this check everytime by cachingcompiler->canUseApxEncoding()
- On
TARGET_AMD64
machines with APX not supported.
Here we determine what to do depending on ifbuildNode
at this point already has some determined candidate(we have some cases whereecx
is the only candidate for example and I don't want to return all low GPRs)
if (forceLowGpr || DoesThisUseGPR(tree)) { if (candidates == RBM_NONE) { return lowGPRRegs(); } else { return (candidates & lowGPRRegs()); } }
So, are you okay with caching just compiler->canUseApxEncoding()
?
Hopefully I haven't missed something else very obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well cache lowGPRegs()
as well. yeah, overall don't think it might make much difference and the fact that it is inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached compiler->canUseApxEncoding()
. lowGPRegs()
I have left as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowGPRegs() I have left as is for now
shouldn't be too hard to also cache it in LinearScan
class, right? It is used lot of times during building intervals, so might save little bit up TP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it but unfortunately it breaks some functionality. getLowGprRegs()
(I updated the function name to match other get functions) use availableIntRegs
There are some cases where availableIntRegs
is modified and this breaks code(Eg :
runtime/src/coreclr/jit/lsra.cpp
Lines 2624 to 2632 in 4e01649
if ((removeMask != RBM_NONE) && ((availableIntRegs & removeMask) != 0)) | |
{ | |
// We know that we're already in "read mode" for availableIntRegs. However, | |
// we need to remove these registers, so subsequent users (like callers | |
// to allRegs()) get the right thing. The RemoveRegistersFromMasks() code | |
// fixes up everything that already took a dependency on the value that was | |
// previously read, so this completes the picture. | |
availableIntRegs.OverrideAssign(availableIntRegs & ~removeMask); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, I see that getLowGprRegs()
is mainly used while building intervals and the code above is executed right before we start building.
src/coreclr/jit/lsraxarch.cpp
Outdated
@@ -2960,7 +3114,17 @@ int LinearScan::BuildIndir(GenTreeIndir* indirTree) | |||
else | |||
#endif | |||
{ | |||
srcCount += BuildOperandUses(source); | |||
GenTree* data = indirTree->Data(); | |||
if (data->isContained() && (data->OperIs(GT_BSWAP, GT_BSWAP16) /* || data->OperIsHWIntrinsic()*/) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@@ -3136,4 +3347,51 @@ inline SingleTypeRegSet LinearScan::BuildEvexIncompatibleMask(GenTree* tree) | |||
#endif | |||
} | |||
|
|||
inline bool LinearScan::DoesThisUseGPR(GenTree* op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method docs please.
src/coreclr/jit/jitconfigvalues.h
Outdated
@@ -425,6 +425,7 @@ RELEASE_CONFIG_INTEGER(EnableSSE3_4, "EnableSSE3_4", | |||
RELEASE_CONFIG_INTEGER(EnableSSE41, "EnableSSE41", 1) // Allows SSE4.1+ hardware intrinsics to be disabled | |||
RELEASE_CONFIG_INTEGER(EnableSSE42, "EnableSSE42", 1) // Allows SSE4.2+ hardware intrinsics to be disabled | |||
RELEASE_CONFIG_INTEGER(EnableSSSE3, "EnableSSSE3", 1) // Allows SSSE3+ hardware intrinsics to be disabled | |||
RELEASE_CONFIG_INTEGER(EnableAPX, "EnableAPX", 1) // Allows APX+ features to be disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does APX automatically light-up on an APX capable machine?
I think we need to configure APX as disabled by default, and require a user to "opt-in" to APX support by enabling a configuration parameter. At least until we are able to test thoroughly on actual hardware. In particular, I don't expect we will "get there" (i.e., have hardware and do enough testing on it) to enable APX by default for .NET 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this flag is mostly for testing purposes for 'turning off' features. Since we do not have any testing pipelines currently, I can go ahead and remove this as per Kunal's comment here - #108799 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have these configs. But I wonder if the current default should be 0, not 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. This config by itself doesn't turn APX on in 'normal workflow'. It has 2 purposes
Use 1 : Link. It's a knob to turn APX on when using altjit unless otherwise specified(link). Out current design is to turn all available features ON with altjit
Use 2 : A knot to turn APX OFF even on machines supporting APX(link
So having the default be 1 doesn't affect the functionality on non APX machines in any way. Having it be 0 would mean manually setting it to 1 to run altjit tests. But considering this is a feature in development and altjit is used for testing, it makes sense to have the default as 0 if we keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to get @jkotas weigh in here.
APX is a lot more involved than AVX10.1 was and is a lot of "net new" handling, so the risk is higher. However, I imagine Intel (@DeepakRajendrakumaran to confirm) will be doing local testing (full test suite, stress modes, some important libraries, etc) on actual hardware as was done for other ISAs in the past, which will help build confidence.
We can notably always change the value here in a patch as well, whether from 0->1
or 1->0
and we'd fix any bugs as we normally do otherwise. So it really just comes down to the default experience we want for devs who buy APX capable CPUs on launch day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have actual hardware yet obviously but we will be doing extensive testing once we have it
For now, I have done the following with APX ON
- Ran all tests under
JIT
subtree withsrc\tests\build
usingsde
with APX ON - Ran superpmi asmdiffs with APX ON to make sure there are no decode fails or asserts as well as seeing perf scores(This is added here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, will we have some CI jobs that will test the APX code paths whenever they are touched? something with altjit route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to get @jkotas weigh in here.
I agree with what @BruceForstall said above.
For altjit, I do not have a strong opinion. I think it would be more intuitive to have it disabled by default so that altjit behavior matches the typical configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with that. Will make the change
@@ -43,7 +43,7 @@ inline static bool isHighGPReg(regNumber reg) | |||
#ifdef TARGET_AMD64 | |||
// TODO-apx: the definition here is incorrect, we will need to revisit this after we extend the register definition. | |||
// for now, we can simply use REX2 as REX. | |||
return ((reg >= REG_R8) && (reg <= REG_R15)); | |||
return ((reg >= REG_R16) && (reg <= REG_R23)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at all the users, but I'm surprised this wouldn't cause diffs. "high GPR" means something for x64 versus x86. Maybe there should be a separate isApxEgpr()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a method added by Ruihan specifically for APX/Rex2/eGPR(3410c76#diff-782ed843d790c8cf94cba03c0d408a37c64cdba7832dbb5a560f76979355bdd2R41-R51). He initially just used REG_R8
and REG_R15
since we didn't have REG_R16 and above. This code would have been inactive till now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes into play only if/when register allocator selects eGPR and emitter has to encode REX2 for eGPR
src/coreclr/jit/compiler.h
Outdated
@@ -10002,6 +10002,11 @@ class Compiler | |||
// | |||
bool canUseApxEncoding() const | |||
{ | |||
if (JitConfig.EnableAPX() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct way to implement a config switch for the non-altjit case.
It should be done here instead - similar how other existing instruction sets are handled:
runtime/src/coreclr/vm/codeman.cpp
Line 1441 in 6c73c19
CPUCompileFlags.Set(InstructionSet_APX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have made the change
673141a
to
2ee0617
Compare
src/coreclr/inc/clrconfigvalues.h
Outdated
@@ -732,6 +732,7 @@ RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSE41, W("EnableSSE41") | |||
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSE42, W("EnableSSE42"), 1, "Allows SSE4.2+ hardware intrinsics to be disabled") | |||
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSSE3, W("EnableSSSE3"), 1, "Allows SSSE3+ hardware intrinsics to be disabled") | |||
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableX86Serialize, W("EnableX86Serialize"), 1, "Allows X86Serialize+ hardware intrinsics to be disabled") | |||
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAPX, W("EnableAPX"), 1, "Allows APX+ features to be disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAPX, W("EnableAPX"), 1, "Allows APX+ features to be disabled") | |
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAPX, W("EnableAPX"), 0, "Allows APX+ features to be disabled") |
The APX+ features should be disabled by default in the shipping runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this originally. Has been changed to 0
2ee0617
to
58e9112
Compare
There are still some left over diffs in arm64 and I really expect that they should be zero. I am guessing they are happing because of using |
99438b6
to
dc3a1e8
Compare
Looks like no asm diffs now. Still some small x64 TP regression but I assume that is inevitable and expected. fyi @DeepakRajendrakumaran there are merge conflicts |
What this PR does
Currently we are adding just 8 new registers so that total register number does not exceed 64
. This is based on the conversation on this PR and following conclusion : linkA LSRA_LIMIT_EXT_GPR_SET register stress mode to force eGPR register usage when possible.
Some minor changes to turn on Rex2 encoding with eGPR
Temporary changes to mask away eGPR for currently un-supported instructions - primarily ones requiring eEVEX + imul + movszx (This will be removed once we have support for these but are essentially while we do not have eEVEX support)
Minor flags to gets altjit to work
Testing
With APX disabled
for TP/asmdiff : link
With APX enabled
ASMDIFF
Code size increases due to Rex2 but PerfScore improves. Note : This is with just a subset of x64 instructions(those requiring eEVEX will be given access to eGPR as part of upcoming changes) having access to eGPR and with just 8 eGPR enabled