-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casting via generic math doesn't always inline in R2R #84421
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsSee #78648. cc: @EgorBo https://csharp.godbolt.org/z/K6xM8hP6P // crossgen2 8.0.0-preview.4.23206.99+18e2c5fd9e2239a8b06fe49dbb6492d40f5e5e19
C:M0():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
mov eax, 42
;; size=5 bbWeight=1 PerfScore 0.25
ret
;; size=1 bbWeight=1 PerfScore 1.00
C:M1():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.UInt32:CreateTruncating[ubyte](ubyte):uint
;; size=3 bbWeight=1 PerfScore 2.00
C:M2():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.UInt32:CreateTruncating[ulong](ulong):uint
;; size=3 bbWeight=1 PerfScore 2.00
C:M3():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.UInt32:CreateTruncating[int](int):uint
;; size=3 bbWeight=1 PerfScore 2.00
C:M4():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.UInt32:CreateTruncating[byte](byte):uint
;; size=3 bbWeight=1 PerfScore 2.00
C:M5():int:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
mov eax, 42
;; size=5 bbWeight=1 PerfScore 0.25
ret
;; size=1 bbWeight=1 PerfScore 1.00
C:M6():int:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.Int32:CreateTruncating[byte](byte):int
;; size=3 bbWeight=1 PerfScore 2.00
C:M7():int:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.Int32:CreateTruncating[long](long):int
;; size=3 bbWeight=1 PerfScore 2.00
C:M8():int:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.Int32:CreateTruncating[uint](uint):int
;; size=3 bbWeight=1 PerfScore 2.00
C:M9():int:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
;; size=0 bbWeight=1 PerfScore 0.00
lea rax, [(reloc)]
mov edi, 42
;; size=12 bbWeight=1 PerfScore 0.75
tail.jmp [rax]System.Int32:CreateTruncating[ubyte](ubyte):int
;; size=3 bbWeight=1 PerfScore 2.00
|
We will push this to Future becuase we will not have time to work on this in .NET 8. |
@xtqqczze it emits expected codegen for JIT and NativeAOT, seems to be R2R specific, from what I see R2R runtime throws
for |
In case it impacts prioritization, we're on a path to make fairly heavy use of this throughout core formatting/parsing logic, as in #84469. |
@dotnet/crossgen-contrib any idea what we can do here for crossgen, is it simply not supported? uint M1() => uint.CreateTruncating((byte)42); This exception leads to this codegen for R2R: C:M1():uint:this:
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
lea rax, [(reloc)]
mov edi, 42
tail.jmp [rax]System.UInt32:CreateTruncating[ubyte](ubyte):uint because JIT caught the exception and decided that it can't inline it |
Well, Crossgen2 doesn't yet support static virtual method resolution, that was planned for .NET 7 but it got cut as it turned out to be quite challenging and we didn't get to it early enough in the release cycle. If it becomes more of a perf problem now we can discuss its prioritization among .NET 8 Crossgen2 work. If the method is not a static virtual then that's just a bug in the check that should be easy to fix. |
Yes it's
I assume it could be e.g. for customers who disable tiering since we now use more and more of these |
OK, thanks for pointing it out, I'll discuss it at today Crossgen2 weekly sync. Maybe we could start supporting at least some cases, IIRC the main problem is precompilation of canonical methods where the resolution depends on the actual type passed at runtime i.o.w. there's no way to resolve the call at compile time. |
I believe this should be fixed with #87438, closing. There are still certain cases that cannot be resolved at compile time, some of them can be fixed in the future by improving the JIT interface. In particular, embedGenericHandle doesn't support passing the constrained type that would be needed if SVM lookup fails at compile time (the SVM lookup itself would still need runtime work but the method calling the SVM could be precompiled); similarly, CanInline doesn't support type constraint information so that, if there's an implementation of the SVM on the interface that defines it, we incorrectly decide we can inline it despite the fact that the constrained lookup could find a different implementation on derived interfaces or types. |
Unfortunately, with #87438, the codegen still has many missed inlining opportunities, especially for signed integers. // crossgen2 8.0.0-rc.1.23409.99+40b39ff7df1dc2388a5865c9ad151dce88fa007d
// Emitting BLENDED_CODE for generic X64 - Unix
C:M0():uint:this (FullOpts):
mov eax, 42
ret
C:M1():uint:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
mov dword ptr [rbp-0x08], 42
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
C:M2():uint:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
mov dword ptr [rbp-0x08], 42
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
C:M3():uint:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
xor esi, esi
mov dword ptr [rbp-0x08], esi
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int32:System.Numerics.INumberBase<System.Int32>.TryConvertToTruncating[uint](int,byref):bool]
test eax, eax
je SHORT G_M000_IG04
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG04:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3
C:M4():uint:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
xor esi, esi
mov dword ptr [rbp-0x08], esi
lea rsi, [rbp-0x08]
mov edi, 42
call [System.SByte:System.Numerics.INumberBase<System.SByte>.TryConvertToTruncating[uint](byte,byref):bool]
test eax, eax
je SHORT G_M000_IG04
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG04:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3
C:M5():int:this (FullOpts):
mov eax, 42
ret
C:M6():int:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int32:TryConvertFromTruncating[byte](byte,byref):bool]
test eax, eax
jne SHORT G_M000_IG04
lea rsi, [rbp-0x08]
mov edi, 42
call [System.SByte:System.Numerics.INumberBase<System.SByte>.TryConvertToTruncating[int](byte,byref):bool]
test eax, eax
je SHORT G_M000_IG06
G_M000_IG04:
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG06:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3
C:M7():int:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int32:TryConvertFromTruncating[long](long,byref):bool]
test eax, eax
jne SHORT G_M000_IG04
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int64:System.Numerics.INumberBase<System.Int64>.TryConvertToTruncating[int](long,byref):bool]
test eax, eax
je SHORT G_M000_IG06
G_M000_IG04:
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG06:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3
C:M8():int:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int32:TryConvertFromTruncating[uint](uint,byref):bool]
test eax, eax
jne SHORT G_M000_IG04
lea rsi, [rbp-0x08]
mov edi, 42
call [System.UInt32:System.Numerics.INumberBase<System.UInt32>.TryConvertToTruncating[int](uint,byref):bool]
test eax, eax
je SHORT G_M000_IG06
G_M000_IG04:
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG06:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3
C:M9():int:this (FullOpts):
push rbp
sub rsp, 16
lea rbp, [rsp+0x10]
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Int32:TryConvertFromTruncating[ubyte](ubyte,byref):bool]
test eax, eax
jne SHORT G_M000_IG04
lea rsi, [rbp-0x08]
mov edi, 42
call [System.Byte:System.Numerics.INumberBase<System.Byte>.TryConvertToTruncating[int](ubyte,byref):bool]
test eax, eax
je SHORT G_M000_IG06
G_M000_IG04:
mov eax, dword ptr [rbp-0x08]
add rsp, 16
pop rbp
ret
G_M000_IG06:
call [System.ThrowHelper:ThrowNotSupportedException()]
int3 |
Related: #78648.
https://csharp.godbolt.org/z/K6xM8hP6P
The text was updated successfully, but these errors were encountered: