Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Enable software writewatch and card bundles on Windows. #77934

Merged
merged 4 commits into from
Nov 10, 2022

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Nov 5, 2022

Fixes: #75110

This is mostly adding support for writewatch and card bundles in the Windows GC barriers.

Also did some cleanups to

  • make Unix and Windows write barriers in NativeAOT as similar as possible.
  • change ARM64 barriers to a more compact pattern, as used in CoreClr barriers.
    (tailcall chain of Ref->Checked->Unchecked, not doing stlr when write is not to the heap, etc..)
  • trash the same registers in ARM64 barriers as in CoreClr
    (RiuJIT allows some freedom in scratch register use, but it is better to be similar here)

@VSadov
Copy link
Member Author

VSadov commented Nov 5, 2022

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@teo-tsirpanis
Copy link
Contributor

@VSadov to make GitHub close the issue you have to separate it with a space like Fixes #75110.

@VSadov
Copy link
Member Author

VSadov commented Nov 5, 2022

@teo-tsirpanis Thanks! I was wondering why it works sometimes

@VSadov
Copy link
Member Author

VSadov commented Nov 5, 2022

Lots of failures in Frozen dictionaries tests. Are they supposed to fail on NativeAOT? @stephentoub

@VSadov
Copy link
Member Author

VSadov commented Nov 8, 2022

the failures are #78046

@VSadov VSadov marked this pull request as ready for review November 8, 2022 18:54
@VSadov VSadov requested a review from jkotas November 8, 2022 18:54
@@ -307,7 +306,7 @@ LOCAL_LABEL(RhpByRefAssignRef_CheckCardTable):
shr rcx, 0x0B
mov r10, [C_VAR(g_card_table)]
cmp byte ptr [rcx + r10], 0x0FF
je LOCAL_LABEL(RhpByRefAssignRef_NotInHeap)
je LOCAL_LABEL(RhpByRefAssignRef_NoBarrierRequired)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR versions of these write-barriers use pattern like this:

        test    byte ptr [rcx + r10], al
        je      SetCardTableBit_ByRefWriteBarrier
        REPRET

    SetCardTableBit_ByRefWriteBarrier:

It makes the hot path of the write barrier fit into fewer cache lines and it makes the jump better statically predicted (forward conditional branches are predicted not taken by default).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It does not look easy to retrofit this pattern into the native AOT write barriers. You can ignore this feedback.)

// other registers are preserved
//
.macro UPDATE_GC_SHADOW destReg, refReg

// If g_GCShadow is 0, don't perform the check.
PREPARE_EXTERNAL_VAR_INDIRECT g_GCShadow, X9
cbz x9, 1f
PREPARE_EXTERNAL_VAR_INDIRECT g_GCShadow, X12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR write barriers seem to be using more efficient pattern to access the statics on arm64. Do you happen to know how much does it save?

Copy link
Member Author

@VSadov VSadov Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the version that patches constants in the code?
I think accessing inline constants is a bit more compact, but on arm64 it is not as nice as on x64 as you can't just load an immediate value, and CPU is likely trained to recognize the indirect load pattern.
Otherwise, these are values that change extremely rarely, so they would be in every cache. In terms of cache misses it is similar cost, but skips an indirection.

We could implement the scheme with patching inline constants and it is likely to be an improvement, but likely too small to consider it an urgency.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@VSadov
Copy link
Member Author

VSadov commented Nov 10, 2022

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Nov 10, 2022

The only NativeAOT failure is #76501 (which is known and fixed already)

@VSadov
Copy link
Member Author

VSadov commented Nov 10, 2022

Thanks!!

@VSadov VSadov merged commit cd7e871 into dotnet:main Nov 10, 2022
@VSadov VSadov deleted the winsoft branch November 10, 2022 17:41
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] Enable card bundles and software writewatch on Windows
3 participants