You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JIT: Add a (disabled) prototype for a generalized promotion pass (#83388)
Introduce a "physical" promotion pass that generalizes the existing promotion.
More specifically, it does not have restrictions on field count and it can
handle arbitrary recursive promotion.
The pass is physical in the sense that it does not rely on any field metadata
for structs. Instead, it works in two separate passes over the IR:
1. In the first pass we find and analyze how unpromoted struct locals are
accessed. For example, for a simple program like:
```
public static void Main()
{
S s = default;
Call(s, s.C);
Console.WriteLine(s.B + s.C);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Call(S s, byte b)
{
}
private struct S
{
public byte A, B, C, D, E;
}
```
we see IR like:
```
***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
[000003] IA--------- ▌ ASG struct (init)
[000001] D------N--- ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0
[000002] ----------- └──▌ CNS_INT int 0
***** BB01
STMT00001 ( 0x008[E-] ... 0x026 )
[000008] --C-G------ ▌ CALL void Program:Call(Program+S,ubyte)
[000004] ----------- arg0 ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0
[000007] ----------- arg1 └──▌ LCL_FLD ubyte V00 loc0 [+2]
***** BB01
STMT00002 ( 0x014[E-] ... ??? )
[000016] --C-G------ ▌ CALL void System.Console:WriteLine(int)
[000015] ----------- arg0 └──▌ ADD int
[000011] ----------- ├──▌ LCL_FLD ubyte V00 loc0 [+1]
[000014] ----------- └──▌ LCL_FLD ubyte V00 loc0 [+2]
```
and the analysis produces
```
Accesses for V00
[000..005)
#: (2, 200)
# assigned from: (0, 0)
# assigned to: (1, 100)
# as call arg: (1, 100)
# as implicit by-ref call arg: (1, 100)
# as on-stack call arg: (0, 0)
# as retbuf: (0, 0)
# as returned value: (0, 0)
ubyte @ 001
#: (1, 100)
# assigned from: (0, 0)
# assigned to: (0, 0)
# as call arg: (0, 0)
# as implicit by-ref call arg: (0, 0)
# as on-stack call arg: (0, 0)
# as retbuf: (0, 0)
# as returned value: (0, 0)
ubyte @ 002
#: (2, 200)
# assigned from: (0, 0)
# assigned to: (0, 0)
# as call arg: (1, 100)
# as implicit by-ref call arg: (0, 0)
# as on-stack call arg: (0, 0)
# as retbuf: (0, 0)
# as returned value: (0, 0)
```
Here the pairs are (#ref counts, wtd ref counts).
Based on this accounting, the analysis estimates the profitability of replacing
some of the accessed parts of the struct with a local. This may be costly
because overlapping struct accesses (e.g. passing the whole struct as an
argument) may require more expensive codegen after promotion. And of course,
creating new locals introduces more register pressure. Currently the
profitability analysis is very crude.
In this case the logic decides that promotion is not worth it:
```
Evaluating access ubyte @ 001
Single write-back cost: 5
Write backs: 100
Read backs: 100
Cost with: 1350
Cost without: 650
Disqualifying replacement
Evaluating access ubyte @ 002
Single write-back cost: 5
Write backs: 100
Read backs: 100
Cost with: 1700
Cost without: 1300
Disqualifying replacement
```
2. In the second pass the field accesses are replaced with new locals for the
profitable cases. For overlapping accesses that currently involves writing back
replacements to the struct local first. For arguments/OSR locals, it involves
reading them back from the struct first.
In the above case we can override the profitability analysis with stress mode
STRESS_PHYSICAL_PROMOTION_COST and we get:
```
Evaluating access ubyte @ 001
Single write-back cost: 5
Write backs: 100
Read backs: 100
Cost with: 1350
Cost without: 650
Promoting replacement due to stress
lvaGrabTemp returning 2 (V02 tmp1) (a long lifetime temp) called for V00.[001..002).
Evaluating access ubyte @ 002
Single write-back cost: 5
Write backs: 100
Read backs: 100
Cost with: 1700
Cost without: 1300
Promoting replacement due to stress
lvaGrabTemp returning 3 (V03 tmp2) (a long lifetime temp) called for V00.[002..003).
V00 promoted with 2 replacements
[001..002) promoted as ubyte V02
[002..003) promoted as ubyte V03
...
***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
[000003] IA--------- ▌ ASG struct (init)
[000001] D------N--- ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0
[000002] ----------- └──▌ CNS_INT int 0
***** BB01
STMT00001 ( 0x008[E-] ... 0x026 )
[000008] -ACXG------ ▌ CALL void Program:Call(Program+S,ubyte)
[000004] ----------- arg0 ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0
[000022] -A--------- arg1 └──▌ COMMA ubyte
[000021] -A--------- ├──▌ ASG ubyte
[000019] D------N--- │ ├──▌ LCL_VAR ubyte V03 tmp2
[000020] ----------- │ └──▌ LCL_FLD ubyte V00 loc0 [+2]
[000018] ----------- └──▌ LCL_VAR ubyte V03 tmp2
***** BB01
STMT00002 ( 0x014[E-] ... ??? )
[000016] -ACXG------ ▌ CALL void System.Console:WriteLine(int)
[000015] -A--------- arg0 └──▌ ADD int
[000027] -A--------- ├──▌ COMMA ubyte
[000026] -A--------- │ ├──▌ ASG ubyte
[000024] D------N--- │ │ ├──▌ LCL_VAR ubyte V02 tmp1
[000025] ----------- │ │ └──▌ LCL_FLD ubyte V00 loc0 [+1]
[000023] ----------- │ └──▌ LCL_VAR ubyte V02 tmp1
[000028] ----------- └──▌ LCL_VAR ubyte V03 tmp2
```
The pass still only has rudimentary support and is missing many basic CQ
optimization optimizations. For example, it does not make use of any liveness
yet and it does not have any decomposition support for assignments. Yet, it
already shows good potential in user benchmarks. I have listed some follow-up
improvements in #76928.
This PR is adding the pass but it is disabled by default. It can be enabled by
setting DOTNET_JitStressModeNames=STRESS_PHYSICAL_PROMOTION. There are two new
scenarios added to jit-experimental that enables it, to be used for testing
purposes.
0 commit comments