-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Enable software writewatch and card bundles on Windows. #77934
Conversation
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
@VSadov to make GitHub close the issue you have to separate it with a space like |
@teo-tsirpanis Thanks! I was wondering why it works sometimes |
Lots of failures in Frozen dictionaries tests. Are they supposed to fail on NativeAOT? @stephentoub |
the failures are #78046 |
@@ -307,7 +306,7 @@ LOCAL_LABEL(RhpByRefAssignRef_CheckCardTable): | |||
shr rcx, 0x0B | |||
mov r10, [C_VAR(g_card_table)] | |||
cmp byte ptr [rcx + r10], 0x0FF | |||
je LOCAL_LABEL(RhpByRefAssignRef_NotInHeap) | |||
je LOCAL_LABEL(RhpByRefAssignRef_NoBarrierRequired) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreCLR versions of these write-barriers use pattern like this:
test byte ptr [rcx + r10], al
je SetCardTableBit_ByRefWriteBarrier
REPRET
SetCardTableBit_ByRefWriteBarrier:
It makes the hot path of the write barrier fit into fewer cache lines and it makes the jump better statically predicted (forward conditional branches are predicted not taken by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It does not look easy to retrofit this pattern into the native AOT write barriers. You can ignore this feedback.)
// other registers are preserved | ||
// | ||
.macro UPDATE_GC_SHADOW destReg, refReg | ||
|
||
// If g_GCShadow is 0, don't perform the check. | ||
PREPARE_EXTERNAL_VAR_INDIRECT g_GCShadow, X9 | ||
cbz x9, 1f | ||
PREPARE_EXTERNAL_VAR_INDIRECT g_GCShadow, X12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreCLR write barriers seem to be using more efficient pattern to access the statics on arm64. Do you happen to know how much does it save?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the version that patches constants in the code?
I think accessing inline constants is a bit more compact, but on arm64 it is not as nice as on x64 as you can't just load an immediate value, and CPU is likely trained to recognize the indirect load pattern.
Otherwise, these are values that change extremely rarely, so they would be in every cache. In terms of cache misses it is similar cost, but skips an indirection.
We could implement the scheme with patching inline constants and it is likely to be an improvement, but likely too small to consider it an urgency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
The only NativeAOT failure is #76501 (which is known and fixed already) |
Thanks!! |
Fixes: #75110
This is mostly adding support for writewatch and card bundles in the Windows GC barriers.
Also did some cleanups to
(tailcall chain of Ref->Checked->Unchecked, not doing stlr when write is not to the heap, etc..)
(RiuJIT allows some freedom in scratch register use, but it is better to be similar here)