-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: optimize redundant branches by looking through phis #76283
Conversation
In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph. For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization. Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor. Fixes dotnet#75944. Contributes to dotnet#48115.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIn some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph. For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization. Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor. Fixes #75987.
|
@EgorBo PTAL Expecting a fairly large number of diffs. Looked at some of the bigger regressions and they seem to be changes in LSRA. |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
Are failures related? Diffs are nice indeed 🚀 |
I think so. Debugging now. |
Fixed that but looks like it has uncovered other problems. |
Oops, the fix was wrong, |
different SSA def numbers, despite the fact that we don't allow overlapping SSA lifetimes. For example there may be a Phi in another block that gets copied to some other local and then reaches the current block via that local. So make sure that when we search local PHIs we also match the ssa def number to ensure we're looking at the right PHIs.
The issue was that both relop operands had PhiDef VNs for V04, but only one of them matched the local PHI; the other came from a "nonlocal" PHI (via a copy to another local). So when scanning for the right local PHI we need to match both the local num and the SSA def num we get from the PhiDef VN. |
The condition for the bug is (seemingly) pretty rare, so I don't expect the proper fix to perturb the diffs much. Code for ; Assembly listing for method X:M1(System.Object)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 arg0 [V00,T00] ( 0, 0 ) ref -> zero-ref class-hnd
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V02 tmp1 [V02,T01] ( 0, 0 ) ref -> zero-ref
;* V03 tmp2 [V03 ] ( 0, 0 ) ref -> zero-ref class-hnd exact single-def "NewObj constructor temp"
;* V04 tmp3 [V04,T02] ( 0, 0 ) ref -> zero-ref class-hnd exact single-def "NewObj constructor temp"
;
; Lcl frame size = 40
G_M9243_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M9243_IG02: ;; offset=0004H
4883C428 add rsp, 40
C3 ret
;; size=5 bbWeight=1 PerfScore 1.25 Getting to this point required all of the tricks here and in #76207, in particular the fall through pred conversion / VN update / retry if block is still BBJ_COND. As seen below....
|
@AndyAyersMS Does it improve codegen for this: static bool Test(object obj) => obj is int; In current Main seems to be emitting: ; Method Proga:Test(System.Object):bool
G_M55190_IG01:
G_M55190_IG02:
4885C9 test rcx, rcx
7411 je SHORT G_M55190_IG05
G_M55190_IG03:
48B8E0B8A064FE7F0000 mov rax, 0x7FFE64A0B8E0 ; System.Int32
483901 cmp qword ptr [rcx], rax
7402 je SHORT G_M55190_IG05
G_M55190_IG04:
33C9 xor rcx, rcx
G_M55190_IG05:
33C0 xor eax, eax
4885C9 test rcx, rcx
0F95C0 setne al
G_M55190_IG06:
C3 ret
; Total bytes of code: 31 |
No, that is the "side effect" case which we still can't handle. |
You might recall handling this requires some code duplication and probably an SSA update. Likely the next thing to try and fix. |
@AndyAyersMS Ah I see. By the way, when we have a PHI node and one of the preds is another PHI - does this PR handle that or give up? |
We give up since both those values arrive via the same pred edge. That is, given something like |
@EgorBo the bug is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad that the diffs remain 🙂
ValueNum phiArgVN = phiArgNode->GetVN(VNK_Liberal); | ||
|
||
// We sometimes see cases where phi args do not have VNs. | ||
// (I recall seeing this before... track down why) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track down why
It's because we don't number PHI_ARG
s which depend on a cycle (i. e. represent defs from blocks we haven't numbered yet).
In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.
For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization.
Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor.
Fixes #75987.
Contributes to #48115.
Diffs