-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlined GC Polls for call to methods with SuppressGCTransitionAttribute #39111
Conversation
For this benchmark:
the results are
|
Assembly diff for the benchmark: G_M34627_IG01:
55 push rbp
4883EC20 sub rsp, 32
488D6C2420 lea rbp, [rsp+20H]
G_M34627_IG02:
- E801672A5F call CORINFO_HELP_POLL_GC
+ 48B800C6641AF87F0000 mov rax, 0x7FF81A64C600
+ 833800 cmp dword ptr [rax], 0
+ 750C jne SHORT G_M34627_IG05
G_M34627_IG03:
E8BE7296FF call GCPolls.Program:GetTickCount():int
90 nop
G_M34627_IG04:
488D6500 lea rsp, [rbp]
5D pop rbp
C3 ret
+G_M34627_IG05:
+ E806682A5F call CORINFO_HELP_POLL_GC
+ EBED jmp SHORT G_M34627_IG03
-; Total bytes of code 27, prolog size 10, PerfScore 8.70, (MethodHash=118078bc) for method GCPolls.Program:GetTickCountManaged():int:this
+; Total bytes of code 44, prolog size 10, PerfScore 12.65, (MethodHash=118078bc) for method GCPolls.Program:GetTickCountManaged():int:this |
I prototyped switching Math.Pow be a QCall with [SuppressGCTransition] and verified that for this example:
we insert one inlined GC poll and GC pause is under 1 ms. |
x64 PMI framework diffs:
|
@briansull @AndyAyersMS PTAL |
Just tried your branch locally: [DllImport("kernel32.dll")]
[SuppressGCTransition]
public static extern int GetCurrentProcessId();
[DllImport("kernel32.dll")]
[SuppressGCTransition]
public static extern int GetCurrentThreadId();
static int Foo(bool c1, bool c2)
{
if (c1)
return GetCurrentProcessId();
if (c2)
return GetCurrentThreadId();
return 42;
} Asm: ; Method ConsoleApp212.Program:Foo(bool,bool):int
G_M46591_IG01:
55 push rbp
4883EC20 sub rsp, 32
488D6C2420 lea rbp, [rsp+20H]
;; bbWeight=1 PerfScore 1.75
G_M46591_IG02:
84C9 test cl, cl
741B je SHORT G_M46591_IG06
;; bbWeight=1 PerfScore 1.25
G_M46591_IG03:
48B810760AB2FD7F0000 mov rax, 0x7FFDB20A7610
833800 cmp dword ptr [rax], 0
7536 jne SHORT G_M46591_IG11
;; bbWeight=0.50 PerfScore 1.63
G_M46591_IG04:
E8BAEBFFFF call ConsoleApp212.Program:GetCurrentProcessId():int
90 nop
;; bbWeight=0.50 PerfScore 0.63
G_M46591_IG05:
488D6500 lea rsp, [rbp]
5D pop rbp
C3 ret
;; bbWeight=0.50 PerfScore 1.00
G_M46591_IG06:
84D2 test dl, dl
741B je SHORT G_M46591_IG09
48B810760AB2FD7F0000 mov rax, 0x7FFDB20A7610
833800 cmp dword ptr [rax], 0
751E jne SHORT G_M46591_IG12
;; bbWeight=0.50 PerfScore 2.25
G_M46591_IG07:
E8A7EBFFFF call ConsoleApp212.Program:GetCurrentThreadId():int
90 nop
;; bbWeight=0.50 PerfScore 0.63
G_M46591_IG08:
488D6500 lea rsp, [rbp]
5D pop rbp
C3 ret
;; bbWeight=0.50 PerfScore 1.00
G_M46591_IG09:
B82A000000 mov eax, 42
;; bbWeight=0.50 PerfScore 0.13
G_M46591_IG10:
488D6500 lea rsp, [rbp]
5D pop rbp
C3 ret
;; bbWeight=0.50 PerfScore 1.00
G_M46591_IG11:
E838ED925F call CORINFO_HELP_POLL_GC
EBC3 jmp SHORT G_M46591_IG04
;; bbWeight=0 PerfScore 0.00
G_M46591_IG12:
E831ED925F call CORINFO_HELP_POLL_GC
EBDB jmp SHORT G_M46591_IG07
;; bbWeight=0 PerfScore 0.00
; Total bytes of code: 97 is it possible to re-use |
I don't think it's possible. The basic blocks have different jmp targets. |
Oops, didn't notice it, thanks |
a minor optimization 🙂 static int Foo()
{
if (GetCurrentProcessId() == -1) // inserts GCPoll
return GetCurrentThreadId(); // also inserts GCPoll but predecessor already have one.
return -1;
} (If you can rely on BBF_HAS_SUPPRESSGC_CALL flag) |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Left a few suggestions for you to consider.
@erozenfeld I added some tests for the JIT here when I originally add |
@AaronRobinsonMSFT Do you have any particular suggestions for additional tests? |
@erozenfeld Not specifically. The tests I wrote were added because the behavior of properties in loops and in predicates was causing issues with my implementation. Judging from your changes it seems that inserted polls are cleaner and less impacted by specific code patterns. I defer to you and the JIT team to decide if any additional are warranted. |
* Emit inlined GC Polls for methods with SuppressGCTransitionAttribute when possible and when optimizing. * Emit only one GC poll per basic block. * Move insertion of GC polls to a new phase `fgInsertGCPolls` that runs after most optimizations so that we don't insert unnecessary GC polls. * I plan to delete `fgCreateGCPolls` phase that was previously used to insert GC polls for platforms that don't support hijacking in a subsequent PR. We currently don't support such platforms. * Fix `fgCreateGCPoll` to be able to insert inlined GC polls for `BBJ_NONE` and `BBJ_THROW` basic blocks.
#endif // DEBUG | ||
|
||
// we don't want to split the single return block | ||
pollType = GCPOLL_CALL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this case. We will not inline the poll here. It this common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't ever have a call to a method with SuppressGCTransitionAttribute
in genReturnBB
so we shouldn't see this case.
I copied these lines from fgCreateGCPolls
, which was used for the GC polls for platforms that don't support hijacking. I'd like to leave these lines just in case we reuse this method for inserting polls for something else in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks!
@@ -9249,7 +9249,7 @@ void Compiler::optRemoveRedundantZeroInits() | |||
Statement* next = stmt->GetNextStmt(); | |||
for (GenTree* tree = stmt->GetTreeList(); tree != nullptr; tree = tree->gtNext) | |||
{ | |||
if (((tree->gtFlags & GTF_CALL) != 0) && (!tree->IsCall() || !tree->AsCall()->IsSuppressGCTransition())) | |||
if (((tree->gtFlags & GTF_CALL) != 0)) | |||
{ | |||
hasGCSafePoint = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have made this optimization more conservative.
This local 'hasGCSafePoint ' probably should now be renamed 'hasCall' .
There is also a method called Compiler::IsGcSafePoint
that can be used to determine if a call is a GC safe point.
It is currently only used by fgMorph to set the BBF_GC_SAFE_POINT flag on a BasicBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. This code needs to know if the node is a potential GC-safe point. Since now we insert GC Polls after this optimization, we have to treat calls to methods with SuppressGCTransitionAttribute
as potential GC-safe points.
Compiler::IsGcSafePoint
will return true if the node is definitely a GC-safe point and will return false if the node may or may not be a GC-safe point so it shouldn't be used here.
@AndyAyersMS I addressed your feedback and also pushed a fix for x86 tail calls via helpers (in a separate commit). Can you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Emit inlined GC Polls for methods with SuppressGCTransitionAttribute
when possible and when optimizing.
Emit only one GC poll per basic block.
Move insertion of GC polls to a new phase
fgInsertGCPolls
that runs aftermost optimizations so that we don't insert unnecessary GC polls.
I plan to delete
fgCreateGCPolls
phase that was previously used to insertGC polls for platforms that don't support hijacking in a subsequent PR.
We currently don't support such platforms.
Fix
fgCreateGCPoll
to be able to insert inlined GC polls forBBJ_NONE
andBBJ_THROW
basic blocks.