Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider devirtualizing & inlining Comparer<T>.Default.CompareTo as well. #39873

Closed
teo-tsirpanis opened this issue Jul 23, 2020 · 6 comments
Closed
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@teo-tsirpanis
Copy link
Contributor

teo-tsirpanis commented Jul 23, 2020

Starting in .NET Core 2.1, calling EqualityComparer<T>.Default.Equals is specially optimized by the JIT by devirtualizing and inlining the equality check.

However, as seen in SharpLab, the similar Comparer<T>.Default.CompareTo did not have the same fate. As seen in the generated assembly code, comparing integers with via IComparable<int> is optimized, but calling Comparer<int>.CompareTo(x1, x2) results in a function call. Making the two examples generate identical code would be a good optimization opportunity.

category:cq
theme:devirtualization
skill-level:intermediate
cost:small

@AndyAyersMS
Copy link
Member

Thanks, will take a look. Suspect this is similar to #39519, the jit devirtualizes late and so can't inline.

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 24, 2020
@AndyAyersMS AndyAyersMS added this to the Future milestone Jul 24, 2020
@AndyAyersMS
Copy link
Member

For Compare2 I see more complex codegen than sharplab...suspect their ambient has done some class init?

; Assembly listing for method C:Compare2(int,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T01] (  3,  3   )     int  ->  rsi        
;  V01 arg1         [V01,T02] (  3,  3   )     int  ->  rdi        
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V03 tmp1         [V03,T00] (  3,  6   )     ref  ->  rcx         "argument with side effect"
;
; Lcl frame size = 40

G_M274_IG01:
       push     rdi
       push     rsi
       sub      rsp, 40
       mov      esi, ecx
       mov      edi, edx
						;; bbWeight=1    PerfScore 2.75
G_M274_IG02:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 37
       call     CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
       mov      rcx, 0xD1FFAB1E
       mov      rcx, gword ptr [rcx]
       mov      edx, esi
       mov      r8d, edi
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+64]
       mov      rax, qword ptr [rax+32]
						;; bbWeight=1    PerfScore 10.25
G_M274_IG03:
       add      rsp, 40
       pop      rsi
       pop      rdi
       rex.jmp  rax
						;; bbWeight=1    PerfScore 3.25

Late devirt here requires class init, so doesn't happen either.

Querying runtime about current class of field Comparer`1.<Default>k__BackingField (declared as Comparer`1)
Field's current class not available

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is Comparer`1 (attrib 20000400)
    base method is Comparer`1::Compare
    devirt to Comparer`1::Compare -- inexact or not final
               [000004] --C-G-------              *  CALLV ind int    Comparer`1.Compare
               [000011] --CXG------- this in rcx  +--*  COMMA     ref   
               [000010] H-CXG-------              |  +--*  CALL help long   HELPER.CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
               [000008] ------------ arg0         |  |  +--*  CNS_INT   long   0x7ffa70c30020
               [000009] ------------ arg1         |  |  \--*  CNS_INT   int    37
               [000006] ----G-------              |  \--*  FIELD     ref    <Default>k__BackingField
               [000002] ------------ arg1         +--*  LCL_VAR   int    V00 arg0         
               [000003] ------------ arg2         \--*  LCL_VAR   int    V01 arg1         
    Class not final or exact, and method not final

When the Comparer class has been initialized I get the sharplab code, and indeed here we've devirtualized late:

; Assembly listing for method C:Compare2(int,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   r8        
;  V01 arg1         [V01,T01] (  3,  3   )     int  ->  rax        
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M274_IG01:
       mov      r8d, ecx
       mov      eax, edx
						;; bbWeight=1    PerfScore 0.50
G_M274_IG02:
       mov      rcx, 0xD1FFAB1E
       mov      rcx, gword ptr [rcx]
       mov      edx, r8d
       mov      r8d, eax
       mov      rax, 0xD1FFAB1E
       mov      rax, qword ptr [rax]
						;; bbWeight=1    PerfScore 5.00
G_M274_IG03:
       rex.jmp  rax
						;; bbWeight=1    PerfScore 2.00

as seen in the jit dump

Querying runtime about current class of field Comparer`1.<Default>k__BackingField (declared as Comparer`1)
Runtime reports field is init-only and initialized and has class GenericComparer`1

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is GenericComparer`1 [exact] (attrib 20000010)
    base method is Comparer`1::Compare
    devirt to GenericComparer`1::Compare -- exact
               [000004] --C-G-------              *  CALLV ind int    Comparer`1.Compare
               [000006] ----G------- this in rcx  +--*  FIELD     ref    <Default>k__BackingField
               [000002] ------------ arg1         +--*  LCL_VAR   int    V00 arg0         
               [000003] ------------ arg2         \--*  LCL_VAR   int    V01 arg1         
    exact; can devirtualize
... after devirt...
               [000004] --C-G-------              *  CALL      int    GenericComparer`1.Compare
               [000006] ----G------- this in rcx  +--*  FIELD     ref    <Default>k__BackingField
               [000002] ------------ arg1         +--*  LCL_VAR   int    V00 arg0         
               [000003] ------------ arg2         \--*  LCL_VAR   int    V01 arg1         

Codegen for some other types is interesting too. For short types we use subtract which is good but somehow also need to spill and reload one arg:

; Assembly listing for method C:Compare(short,short):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )   short  ->  [rsp+0x08]   do-not-enreg[F] ld-addr-op
;  V01 arg1         [V01,T01] (  3,  3   )   short  ->  rdx        
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M29376_IG01:
       mov      dword ptr [rsp+08H], ecx
						;; bbWeight=1    PerfScore 1.00
G_M29376_IG02:
       movsx    rax, word  ptr [rsp+08H]
       movsx    rdx, dx
       sub      eax, edx
						;; bbWeight=1    PerfScore 1.50
G_M29376_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00

@AndyAyersMS
Copy link
Member

For the short case of Compare, the method is defined on a struct, so we indirect the this arg. The call gets inlined, showing that the address is immediately dereferenced, but we're unable to clean this up:

Trees after Morph - Inlining
STMT00001 (IL   ???...  ???)
               [000005] --C---------              *  RETURN    int   
               [000009] ---XG-------              \--*  SUB       int   
               [000008] *--XG-------                 +--*  IND       short 
               [000006] ------------                 |  \--*  ADDR      long  
               [000007] -------N----                 |     \--*  LCL_VAR   int    V00 arg0         
               [000002] ------------                 \--*  LCL_VAR   short  V01 arg1         

Wonder if the mixture of types here gets in the way. Seems a bit odd that we widen the type in LCL_VAR int V00 arg0 as we're pushing the address of the local, not its value.

cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member

Wonder if the mixture of types here gets in the way

Seems to be the case, with

@@ -11536,8 +11536,8 @@ void Compiler::impImportBlockCode(BasicBlock* block)
                 goto ADRVAR;

             ADRVAR:
-                op1 = gtNewLclvNode(lclNum, lvaGetActualType(lclNum) DEBUGARG(opcodeOffs + sz + 1));
+                op1 = gtNewLclvNode(lclNum, lvaGetRealType(lclNum) DEBUGARG(opcodeOffs + sz + 1));

we now generate

; Assembly listing for method C:Compare(short,short):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )   short  ->  rcx         ld-addr-op
;  V01 arg1         [V01,T01] (  3,  3   )   short  ->  rdx        
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M29376_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M29376_IG02:
       movsx    rax, cx
       movsx    rdx, dx
       sub      eax, edx
						;; bbWeight=1    PerfScore 0.75
G_M29376_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 11, prolog size 0, PerfScore 2.85, (MethodHash=b17e8d3f) for method C:Compare(short,short):int

Quick gloss of the above importer change over SPC looks promising:

PMI CodeSize Diffs for System.Private.CoreLib.dll for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -2427 (-0.05% of base)
    diff is an improvement.
Top file improvements (bytes):
       -2427 : System.Private.CoreLib.dasm (-0.05% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions (bytes):
          12 ( 2.74% of base) : System.Private.CoreLib.dasm - ManifestBuilder:AddChannel(String,int,EventChannelAttribute):this
          10 ( 1.55% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatFloatingPoint(short,Span`1,byref,StandardFormat):bool
           5 ( 3.07% of base) : System.Private.CoreLib.dasm - SecurityElement:GetEscapeSequence(ushort):String
           5 ( 1.10% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(short,byref):bool:this
           2 ( 0.28% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatFloatingPoint(ubyte,Span`1,byref,StandardFormat):bool
           2 ( 3.92% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:Equals(ubyte,ubyte):bool:this
           2 ( 0.42% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ubyte,byref):bool:this
           1 ( 1.54% of base) : System.Private.CoreLib.dasm - SpanHelpers:BinarySearch(byref,int,ubyte):int
           1 ( 1.54% of base) : System.Private.CoreLib.dasm - SpanHelpers:BinarySearch(byref,int,short):int
           1 ( 0.33% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendSpanFormattable(ubyte,String,IFormatProvider):StringBuilder:this
           1 ( 0.29% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:AppendSpanFormattable(ubyte,String,IFormatProvider):this
           1 ( 1.89% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:Equals(short,short):bool:this
Top method improvements (bytes):
        -245 (-14.55% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,ubyte,int):int (2 methods)
        -245 (-29.81% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,short,short,short,int):int
        -208 (-26.56% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,short,short,short,int):int
        -195 (-12.18% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,ubyte,ubyte,ubyte,int):int (2 methods)
        -120 (-20.37% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,short,short,int):int
        -119 (-19.60% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,short,short,int):int
        -106 (-8.31% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,int):int (2 methods)
         -77 (-6.11% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,ubyte,ubyte,int):int (2 methods)
         -73 (-20.68% of base) : System.Private.CoreLib.dasm - SpanHelpers:Contains(byref,short,int):bool
         -70 (-9.14% of base) : System.Private.CoreLib.dasm - SpanHelpers:Contains(byref,ubyte,int):bool (2 methods)
         -68 (-17.22% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,short,int):int
         -65 (-6.61% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,ubyte,int):int (2 methods)
         -61 (-15.02% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOf(byref,short,int):int
         -47 (-5.01% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOf(byref,ubyte,int):int (2 methods)
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,ubyte,ubyte,ubyte):int
         -22 (-0.93% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`2:PickPivotAndPartition(Span`1,Span`1):int (5 methods)
         -20 (-4.30% of base) : System.Private.CoreLib.dasm - HashSet`1:FindItemIndex(short):int:this
Top method regressions (percentages):
           2 ( 3.92% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:Equals(ubyte,ubyte):bool:this
           5 ( 3.07% of base) : System.Private.CoreLib.dasm - SecurityElement:GetEscapeSequence(ushort):String
          12 ( 2.74% of base) : System.Private.CoreLib.dasm - ManifestBuilder:AddChannel(String,int,EventChannelAttribute):this
           1 ( 1.89% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:Equals(short,short):bool:this
          10 ( 1.55% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatFloatingPoint(short,Span`1,byref,StandardFormat):bool
           1 ( 1.54% of base) : System.Private.CoreLib.dasm - SpanHelpers:BinarySearch(byref,int,ubyte):int
           1 ( 1.54% of base) : System.Private.CoreLib.dasm - SpanHelpers:BinarySearch(byref,int,short):int
           5 ( 1.10% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(short,byref):bool:this
           2 ( 0.42% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ubyte,byref):bool:this
           1 ( 0.33% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendSpanFormattable(ubyte,String,IFormatProvider):StringBuilder:this
           1 ( 0.29% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:AppendSpanFormattable(ubyte,String,IFormatProvider):this
           2 ( 0.28% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatFloatingPoint(ubyte,Span`1,byref,StandardFormat):bool
Top method improvements (percentages):
          -6 (-60.00% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:GetHashCode(ubyte):int:this
          -6 (-60.00% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:GetHashCode(ubyte):int:this
          -6 (-60.00% of base) : System.Private.CoreLib.dasm - ByteEqualityComparer:GetHashCode(ubyte):int:this
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,ubyte,ubyte,ubyte):int
         -29 (-42.65% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,ubyte,ubyte,ubyte):int
          -6 (-37.50% of base) : System.Private.CoreLib.dasm - GenericComparer`1:Compare(ubyte,ubyte):int:this
         -19 (-37.25% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,ubyte,ubyte):int
         -19 (-37.25% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,ubyte,ubyte):int
         -19 (-37.25% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,ubyte,ubyte):int
         -19 (-37.25% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,ubyte,ubyte):int
          -6 (-35.29% of base) : System.Private.CoreLib.dasm - GenericComparer`1:Compare(short,short):int:this
        -245 (-29.81% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,short,short,short,int):int
          -6 (-27.27% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:Equals(ubyte,ubyte):bool:this
        -208 (-26.56% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,short,short,short,int):int
          -6 (-26.09% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:Equals(short,short):bool:this
          -6 (-25.00% of base) : System.Private.CoreLib.dasm - Convert:ToString(ubyte):String
          -6 (-25.00% of base) : System.Private.CoreLib.dasm - Convert:ToString(ubyte,IFormatProvider):String
          -6 (-25.00% of base) : System.Private.CoreLib.dasm - HashCode:Add(ubyte):this
102 total methods with Code Size differences (90 improved, 12 regressed), 22441 unchanged.

@AndyAyersMS
Copy link
Member

Overall support should parallel what we did for the generic comparer, something like the following:

  • add [Intrinsic] to the Default method for CoreCLR
  • update jit to recognize the intrinsic and flag it as a "special" intrinsic
  • update impGetSpecialIntrinsicExactReturnType to handle this case (deferring computation of the type to the runtime)
  • implement runtime analog of getDefaultEqualityComparerClass for comparers
  • add "special dce" handling for this to fgGetStaticsCCtorHelper so things get properly cleaned up
  • likely some work in crossgen/crossgen2 as well, though not sure exactly what just yet.
  • revisit the change proposed above (possibly a prerequisite)

Will consider this for .Net 6.

@stephentoub
Copy link
Member

This was fixed by #48160, correct?

@EgorBo EgorBo closed this as completed Feb 16, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
Archived in project
Development

No branches or pull requests

5 participants