-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: some small improvements to tail duplication #37038
Conversation
1. Allow duplicating when predecessor is BBJ_NONE 2. Require that predecessor and successor reference the same local 3. Make sure that local is not address exposed 4. Check up to two statements in predecessor for local reference 5. Require successor to compare local to constant, or local to local Changes inspired by dotnet#36649 but don't actually improve CQ for that case (yet).
@dotnet/jit-contrib PTAL Jit diffs
|
Will add some notes on regressions/improvements in a bit. |
Regression in Regressions in
Tried disabling relops generally in the predecessor block screening, but that didn't pan out. Regression in |
randomly chosen improvement: ;; Assembly listing for method SyntaxNodeCache:CanBeCached(GreenNode,GreenNode):bool
;;
;; before
;;
G_M18879_IG02:
test rcx, rcx
je SHORT G_M18879_IG04
G_M18879_IG03:
call GreenNode:get_IsCacheable():bool:this
movzx rcx, al
jmp SHORT G_M18879_IG05
G_M18879_IG04:
mov ecx, 1
G_M18879_IG05:
test ecx, ecx
je SHORT G_M18879_IG10
G_M18879_IG06:
test rsi, rsi
je SHORT G_M18879_IG08
mov rcx, rs
;;
;; after
;;
G_M18879_IG02:
test rcx, rcx
je SHORT G_M18879_IG04
G_M18879_IG03:
call GreenNode:get_IsCacheable():bool:this
movzx rcx, al
test ecx, ecx
je SHORT G_M18879_IG08
G_M18879_IG04:
test rsi, rsi
je SHORT G_M18879_IG06
mov rcx, rsi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but a few additional comments would be nice.
|
||
Statement* const prevStmt = stmt->GetPrevStmt(); | ||
|
||
if (prevStmt == lastStmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth a comment here that GetPrevStmt()
of the fist stmt returns the last stmt. I had forgotten and was initially confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this confuses me at times too. I suppose it saves a pointer in BasicBlock
but it makes reverse iteration through statements a bit clunky.
Seems like we could have GetPrevStmt()
return null for the first statement, and then have BasicBlock::LastStatement()
call some other API or have friend access to the underlying field.
// returns: | ||
// true if this block seems like a good candidate for duplication | ||
// true if this is a good candidate, false otherwise | ||
// if true, lclNum is set to lcl to scan for in predecessor block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a little elaboration, e.g. explaining that good tail call duplication is predicted when we have an assignment to 'lcl' in the predecessor block that can provide information enabling optimization of a compare involving 'lcl' in the successor block, or something to that effect. There's a partial explanation in the above function, but it only tells half the story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, see if this is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent thanks!
For future reference, if we compare enabling vs disabling this optimization entirely (including the changes above, this is the net impact of FX
So roughly speaking, this impacts about 3% of all methods (as enumerated by PMI) and ratio of improvements to regressions is about 5.4. |
Also, add morph to post-phase whitelist because it seems odd not to dump the IR after morph.
Changes inspired by #36649 but don't actually improve CQ for that case (yet).