-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal code generated for even a trivial ValueConverter #85570
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionI'm seeing suboptimal code generated for ValueConverters even when the what the converter does is trivial. For example, class Program
{
private static readonly Foo foo = new();
private static readonly CultureInfo culture = CultureInfo.CurrentCulture;
public int Test()
{
var x = 3;
var y = (int)foo.Convert(x, typeof(int), null, culture);
return y;
}
}
interface IValueConverter
{
object Convert(object value, Type targetType, object? parameter, CultureInfo? culture);
}
sealed class Foo : IValueConverter
{
public object Convert(object value, Type targetType, object? parameter, CultureInfo? culture)
{
return (int)value + 1;
}
} The codegen: ; Assembly listing for method Program:Test():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
G_M000_IG01: ;; offset=0000H
push rdi
push rsi
sub rsp, 40
G_M000_IG02: ;; offset=0006H
mov rsi, 0x7FF7BD400858
mov rcx, rsi
call CORINFO_HELP_NEWSFAST
mov rdi, rax
test byte ptr [(reloc 0x7ff7bd6f9344)], 1
je SHORT G_M000_IG05
G_M000_IG03: ;; offset=0024H
mov rcx, 0x23591401DA8
mov rcx, gword ptr [rcx]
mov dword ptr [rdi+08H], 3
cmp byte ptr [rcx], cl
mov rcx, rsi
call CORINFO_HELP_NEWSFAST
mov ecx, dword ptr [rdi+08H]
inc ecx
mov dword ptr [rax+08H], ecx
mov eax, dword ptr [rax+08H]
G_M000_IG04: ;; offset=004DH
add rsp, 40
pop rsi
pop rdi
ret
G_M000_IG05: ;; offset=0054H
mov rcx, 0x7FF7BD6F9310
mov edx, 4
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
jmp SHORT G_M000_IG03
; Total bytes of code 106 This is important because ValueConverters are heavily used in Desktop apps that uses XAML (WPF, UWP, MAUI and etc.), and we want to minimize the overhead caused by ValueConverters. Expected codegen: mov eax, 4
ret Configuration.NET 8 preview 3
|
We call it "multi-use boxing" problem, #9118 |
cc @markples |
Note that the code in the original post is a trivial case. In real-world cases, there'll be some type tests in the converter to avoid casting exceptions. For example, if (value is int d) return d + 1;
else return default(int); The return value is boxed. |
Thank you for the example.
I'll include this in #9118. |
At tier1 things are a bit simpler, we have
Seems like VN ought to be able to resolve If we could do this the first box allocation would become dead, and if we then did something similar at the return, so would the second one. @EgorBo @jakobbotsch @SingleAccretion any thoughts here..? |
I don't think it would be hard to add the ability to create 'synthetic' field sequences, such as for fields of boxes, it is 'just work'. The one aspect we will need to be careful about is making sure the aliasing aspects are correct, i. e. that no synthetic field may alias any non-synthetic field, and that synthetic fields model shared generic code properly. I can see using the value type's type handle as the discriminator working here. |
Codegen with #103361: no more boxing, just necessary side effect checks. ; Assembly listing for method Program:Main():int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 loc0 [V00 ] ( 0, 0 ) int -> zero-ref single-def
; V01 OutArgs [V01 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1 [V02 ] ( 0, 0 ) long -> zero-ref class-hnd exact "Single-def Box Helper" <System.Int32>
; V03 tmp2 [V03,T00] ( 2, 4 ) ref -> rax class-hnd exact single-def "impImportAndPushBox" <Foo>
;* V04 tmp3 [V04 ] ( 0, 0 ) byref -> zero-ref "inline UNBOX clone1"
;* V05 tmp4 [V05 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "Inlining Arg" <System.Int32>
;* V06 tmp5 [V06 ] ( 0, 0 ) long -> zero-ref class-hnd exact "Single-def Box Helper" <System.Int32>
;* V07 tmp6 [V07 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SF] "stack allocated boxed value class temp" <System.Collections.Generic.KeyValuePair`2[long,int]>
;* V08 tmp7 [V08 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SF] "stack allocated boxed value class temp" <System.Collections.Generic.KeyValuePair`2[long,int]>
;* V09 tmp8 [V09 ] ( 0, 0 ) long -> zero-ref single-def "V07.[000..008)"
;* V10 tmp9 [V10 ] ( 0, 0 ) int -> zero-ref single-def "V07.[008..012)"
;* V11 tmp10 [V11 ] ( 0, 0 ) long -> zero-ref single-def "V08.[000..008)"
;* V12 tmp11 [V12 ] ( 0, 0 ) int -> zero-ref single-def "V08.[008..012)"
;* V13 cse0 [V13,T01] ( 0, 0 ) byref -> zero-ref "CSE #02: aggressive"
;
; Lcl frame size = 40
G_M24375_IG01: ;; offset=0x0000
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M24375_IG02: ;; offset=0x0004
test byte ptr [(reloc 0x7ffe3ba8b4e0)], 1 ; global ptr
je SHORT G_M24375_IG05
;; size=9 bbWeight=1 PerfScore 4.00
G_M24375_IG03: ;; offset=0x000D
mov rax, 0x1EE92800BB8 ; data for Program:foo
mov rax, gword ptr [rax]
cmp byte ptr [rax], al
mov eax, 4
;; size=20 bbWeight=1 PerfScore 5.50
G_M24375_IG04: ;; offset=0x0021
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
G_M24375_IG05: ;; offset=0x0026
mov rcx, 0x7FFE3BA8B448 ; Program
call CORINFO_HELP_GET_GCSTATIC_BASE
jmp SHORT G_M24375_IG03
;; size=17 bbWeight=0 PerfScore 0.00
; Total bytes of code 55, prolog size 4, PerfScore 11.00, instruction count 12, allocated bytes for code 55 (MethodHash=d9b8a0c8) for method Program:Main():int (FullOpts)
; ============================================================ |
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method. The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away. Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570 --------- Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com> Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Fixed by #103361 |
Description
I'm seeing suboptimal code generated for ValueConverters even when what the converter does is trivial.
For example,
The codegen:
This is important because ValueConverters are heavily used in Desktop apps that use XAML (WPF, UWP, MAUI and etc.), and we want to minimize the overhead caused by ValueConverters.
Expected codegen:
Configuration
.NET 8 preview 3
The text was updated successfully, but these errors were encountered: