-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Constant is not propagated through a struct #87072
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsMinimal repro for a CQ issue @stephentoub noticed in #87067: a constant doesn't succesfully propagate through a struct: struct Awaitable
{
int Opts;
Awaitable(bool value)
{
Opts = value ? 1 : 2;
//
//if (value)
// Opts = 1;
//else
// Opts = 2;
}
}
public static int Test() => new Awaitable(false).Opts; Codegen for ; Assembly listing for method Awaitable:Test():int
push rax
xor eax, eax
mov dword ptr [rsp], eax
mov dword ptr [rsp], 2
mov eax, dword ptr [rsp]
add rsp, 8
ret
; Total bytes of code 21 Now replace that ternary operation with ; Method Awaitable:Test():int
mov eax, 2
ret
; Total bytes of code: 6 JitDump diff: https://www.diffchecker.com/ld33PGnr (left - @dotnet/jit-contrib @jakobbotsch
|
Assigning Jakob to triage it (as a struct promotion owner) |
If the successor block is not a join this might be easier to enable -- the complicated case is one where a block's successor has other predecessors, since all predecessors must produce identical stacks of temps so that any one predecessor's exit state represents all predecessors' exit state. The rough idea would be something like this: in Also it seems like we are always cloning the exit state trees when importing a block. Sems like we could optimize this to use the trees directly for one successor and only clone for the others (this happens in |
In this case we manage to fold the join away due to inlining, so it seems conceivable to handle this particular case in that way. However, I think in most natural cases produced by Roslyn (assuming this pattern mainly is produced for ternaries), we are going to see both a split and a join. E.g.: [MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(bool b)
{
S s;
s.Opts = b ? 1 : 2;
}
public struct S
{
public int Opts;
} also ends up with the unfortunate address exposure, and in this case there is both a split and join. Allowing the propagation when there is a unique predecessor would change the IR from: ***** BB01
STMT00001 ( ??? ... 0x003 )
N002 ( 7, 6) [000005] DA--------- ▌ STORE_LCL_VAR byref V03 tmp1 // store of IL stack entries for clique BB01 -> BB02, BB03
N001 ( 3, 3) [000000] ----------- └──▌ LCL_ADDR byref V01 loc0 [+0]
▌ int V01.Program+S:Opts (offs=0x00) -> V06 tmp4
***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
N005 ( 7, 8) [000004] ----------- ▌ JTRUE void
N004 ( 5, 6) [000003] J------N--- └──▌ NE int
N002 ( 3, 4) [000024] ----------- ├──▌ CAST int <- bool <- int
N001 ( 2, 2) [000001] ----------- │ └──▌ LCL_VAR int V00 arg0
N003 ( 1, 1) [000002] ----------- └──▌ CNS_INT int 0
------------ BB02 [005..008) -> BB04 (always), preds={BB01} succs={BB04}
***** BB02
STMT00006 ( ??? ... 0x006 )
N002 ( 7, 5) [000020] DA--------- ▌ STORE_LCL_VAR byref V04 tmp2 // store 1 of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 3, 2) [000007] ----------- └──▌ LCL_VAR byref V03 tmp1
***** BB02
STMT00007 ( ??? ... ??? )
N002 ( 5, 4) [000022] DA--------- ▌ STORE_LCL_VAR int V05 tmp3 // store 2 of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 1, 1) [000019] ----------- └──▌ CNS_INT int 2
------------ BB03 [008..009), preds={BB01} succs={BB04}
***** BB03
STMT00002 ( ??? ... 0x008 )
N002 ( 7, 5) [000010] DA--------- ▌ STORE_LCL_VAR byref V04 tmp2 // store 1 of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 3, 2) [000008] ----------- └──▌ LCL_VAR byref V03 tmp1
***** BB03
STMT00003 ( ??? ... ??? )
N002 ( 5, 4) [000012] DA--------- ▌ STORE_LCL_VAR int V05 tmp3 // store 2 of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 1, 1) [000009] ----------- └──▌ CNS_INT int 1
------------ BB04 [009..00F) (return), preds={BB02,BB03} succs={}
***** BB04
STMT00004 ( ??? ... 0x009 )
N003 ( 10, 7) [000017] -A-XG------ ▌ STOREIND int
N001 ( 3, 2) [000014] ----------- ├──▌ LCL_VAR byref V04 tmp2
N002 ( 3, 2) [000015] ----------- └──▌ LCL_VAR int V05 tmp3 to ***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
N005 ( 7, 8) [000004] ----------- ▌ JTRUE void
N004 ( 5, 6) [000003] J------N--- └──▌ NE int
N002 ( 3, 4) [000024] ----------- ├──▌ CAST int <- bool <- int
N001 ( 2, 2) [000001] ----------- │ └──▌ LCL_VAR int V00 arg0
N003 ( 1, 1) [000002] ----------- └──▌ CNS_INT int 0
------------ BB02 [005..008) -> BB04 (always), preds={BB01} succs={BB04}
***** BB02
STMT00006 ( ??? ... 0x006 )
N002 ( 7, 5) [000020] DA--------- ▌ STORE_LCL_VAR byref V04 tmp2 // new store of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 3, 3) [000000] ----------- └──▌ LCL_ADDR byref V01 loc0 [+0]
▌ int V01.Program+S:Opts (offs=0x00) -> V06 tmp4
***** BB02
STMT00007 ( ??? ... ??? )
N002 ( 5, 4) [000022] DA--------- ▌ STORE_LCL_VAR int V05 tmp3
N001 ( 1, 1) [000019] ----------- └──▌ CNS_INT int 2
------------ BB03 [008..009), preds={BB01} succs={BB04}
***** BB03
STMT00002 ( ??? ... 0x008 )
N002 ( 7, 5) [000010] DA--------- ▌ STORE_LCL_VAR byref V04 tmp2 // new store of IL stack entries for clique BB02, BB03 -> BB04
N001 ( 3, 3) [000000] ----------- └──▌ LCL_ADDR byref V01 loc0 [+0]
▌ int V01.Program+S:Opts (offs=0x00) -> V06 tmp4
***** BB03
STMT00003 ( ??? ... ??? )
N002 ( 5, 4) [000012] DA--------- ▌ STORE_LCL_VAR int V05 tmp3
N001 ( 1, 1) [000009] ----------- └──▌ CNS_INT int 1
------------ BB04 [009..00F) (return), preds={BB02,BB03} succs={}
***** BB04
STMT00004 ( ??? ... 0x009 )
N003 ( 10, 7) [000017] -A-XG------ ▌ STOREIND int
N001 ( 3, 2) [000014] ----------- ├──▌ LCL_VAR byref V04 tmp2
N002 ( 3, 2) [000015] ----------- └──▌ LCL_VAR int V05 tmp3 We could potentially handle this when importing BB04 by checking the predecessor assignments into the spill temps. But we would need to try harder to import blocks in RPO as we don't import these in the right order to do this today (the import order is BB01 -> BB03 -> BB04 -> BB02). |
Seems like it would be useful here and in morph (and maybe in more phases) to build a general worklist driven algorithm for processing in RPO. The rough idea would be to first run a DFS to identify a DAG (or adopt some other DAG-identifying convention like all lexically backward edges are back edges) and then set up a priority queue (or similar) on blocks with the priority being the number of unvisited preds. Then process all the priority 0 nodes. As each node finishes decrement the priority of each successor reached along a non-backedge, until the queue is empty. If the phase trims away an edge because of optimization, then also decrement the successor count (if a non-backedge). If new blocks are added or flow is altered things get tricker. Not a problem for the importer but morph can do various flow edits so we'd need to be careful. We would need a priority queue implementation that is not to allocation-happy, but we can likely use a free list for the internal data structures so can keep the allocation proportional to the number of blocks and perhaps amortize that over multiple traversals. Or since we really only care about nodes of priority zero, just keep a block->count map and whenever a block's count gets to zero, move it to the worklist. |
I have a desire to fix this in 9.0 by introducing some limited flow-sensitive propagation of |
This could also be handled via tail duplication, basically move (and duplicate) the storeind backwards into its preds, based on the intersection of the variables it reads and the constants available for those locals in the preds. I am playing around with this idea in some other contexts... |
This changes local morph to run in RPO when optimizations are enabled. It adds infrastructure to track and propagate LCL_ADDR values assigned to locals (or struct fields) during local morph. This allows us to avoid address exposure in cases where the destination local does not actually end up escaping in any way. Example: ```csharp public struct Awaitable { public int Opts; public Awaitable(bool value) { Opts = value ? 1 : 2; } } [MethodImpl(MethodImplOptions.NoInlining)] public static int Test() => new Awaitable(false).Opts; ``` Before: ```asm G_M59043_IG01: ;; offset=0x0000 push rax ;; size=1 bbWeight=1 PerfScore 1.00 G_M59043_IG02: ;; offset=0x0001 xor eax, eax mov dword ptr [rsp], eax mov dword ptr [rsp], 2 mov eax, dword ptr [rsp] ;; size=15 bbWeight=1 PerfScore 3.25 G_M59043_IG03: ;; offset=0x0010 add rsp, 8 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code: 21 ``` After: ```asm G_M59043_IG02: ;; offset=0x0000 mov eax, 2 ;; size=5 bbWeight=1 PerfScore 0.25 G_M59043_IG03: ;; offset=0x0005 ret ``` Propagating the addresses works much like local assertion prop in morph does. Proving that the locations that were stored to do not escape afterwards is done with a simplistic approach: we check globally that no reads of the location exists, and if so, we replace the `LCL_ADDR` stored to them by a constant 0. We leave it up to liveness to clean up the stores themselves. This could be more sophisticated, but in practice this handles the reported cases just fine. If we were able to remove any `LCL_ADDR` in this way then we run an additional pass over the locals of the IR to compute the final set of exposed locals. Fix dotnet#87072 Fix dotnet#102273 Fix dotnet#102518 This is still not sufficient to handle dotnet#69254. To handle that case we need to handle transferring assertions for struct copies, and also handle proving that specific struct fields containing local addresses do not escape. It is probably doable, but for now I will leave it as future work.
Minimal repro for a CQ issue @stephentoub noticed in #87067: a constant doesn't succesfully propagate through a struct:
Codegen for
Test
:Now replace that ternary operation with
if-else
(uncomment it and remove the ternary):JitDump diff: https://www.diffchecker.com/ld33PGnr (left -
if-esle
, right -ternary
). So, apparently, we don't enregister the Opts field due to address exposure, @SingleAccretion suggested that when we spill all stack entries to locals at the end of a block, we should not spill constants and local addresses. @AndyAyersMS noted that it might be a big amount of work to do so.@dotnet/jit-contrib @jakobbotsch
The text was updated successfully, but these errors were encountered: