Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: fold trees after inline return expression updates #1751

Merged
merged 1 commit into from
Feb 1, 2020

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jan 15, 2020

Aggressively fold as we substitute inline return value trees in for the return
value placeholders. Notice when this folding leads to branch simplification,
and make the associated flow graph update.

The recently added early flow opt pass will then transitively remove any
newly unreachable code.

Resolves dotnet/coreclr#27395. dotnet/coreclr#27935 (now migrated to #13824)

Aggressively fold as we substitute inline return value trees in for the return
value placeholders. Notice when this folding leads to branch simplification,
and make the associated flow graph update.

The recently added early flow opt pass will then transitively remove any
newly unreachable code.

Resolves dotnet/coreclr#27395.
@AndyAyersMS
Copy link
Member Author

This is the change I had in mind when I started working on #1309; I didn't realize at the time that #1309 would end up being quite a bit more impactful.

One would hope that early folding like this would always lead to better code later on, but that's not the case. I will spend some time looking at the more prominent regressions.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -3269 (-0.01% of base)
    diff is an improvement.
Top file regressions (bytes):
          57 : System.Security.Cryptography.Pkcs.dasm (0.01% of base)
          36 : System.Text.RegularExpressions.dasm (0.01% of base)
          35 : System.Security.Cryptography.X509Certificates.dasm (0.02% of base)
          15 : System.Reflection.MetadataLoadContext.dasm (0.01% of base)
          13 : System.Collections.dasm (0.00% of base)
          13 : System.Private.Uri.dasm (0.01% of base)
          10 : System.Runtime.Extensions.dasm (0.01% of base)
          10 : System.Windows.Extensions.dasm (0.03% of base)
           9 : System.Security.Cryptography.Algorithms.dasm (0.00% of base)
           7 : System.Net.Http.dasm (0.00% of base)
           7 : System.Net.Quic.dasm (0.01% of base)
           5 : System.Runtime.Numerics.dasm (0.01% of base)
           4 : System.Net.Http.WinHttpHandler.dasm (0.00% of base)
           3 : System.Diagnostics.Process.dasm (0.00% of base)
           3 : System.Security.Cryptography.Cng.dasm (0.00% of base)
           3 : System.ServiceProcess.ServiceController.dasm (0.01% of base)
           2 : Newtonsoft.Json.dasm (0.00% of base)
           2 : System.Net.WebSockets.Client.dasm (0.01% of base)
           2 : System.Net.WebSockets.dasm (0.01% of base)
           2 : System.Net.WebSockets.WebSocketProtocol.dasm (0.01% of base)
Top file improvements (bytes):
       -2064 : System.Private.CoreLib.dasm (-0.04% of base)
        -591 : System.Memory.dasm (-0.24% of base)
        -251 : System.Text.Json.dasm (-0.06% of base)
        -130 : System.Net.Security.dasm (-0.08% of base)
        -100 : System.Private.Xml.dasm (-0.00% of base)
         -93 : System.Data.SqlClient.dasm (-0.01% of base)
         -67 : System.Net.Mail.dasm (-0.03% of base)
         -62 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -50 : Microsoft.CodeAnalysis.dasm (-0.00% of base)
         -18 : System.Linq.Parallel.dasm (-0.00% of base)
         -17 : System.Threading.Tasks.Parallel.dasm (-0.01% of base)
         -14 : System.Net.Sockets.dasm (-0.01% of base)
         -13 : System.Net.HttpListener.dasm (-0.01% of base)
          -9 : Microsoft.VisualBasic.Core.dasm (-0.00% of base)
          -6 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
          -5 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
          -5 : System.Private.Xml.Linq.dasm (-0.00% of base)
          -4 : System.IO.FileSystem.Watcher.dasm (-0.03% of base)
          -3 : System.IO.FileSystem.dasm (-0.00% of base)
          -3 : System.Numerics.Tensors.dasm (-0.00% of base)
44 total files with Code Size differences (23 improved, 21 regressed), 121 unchanged.
Top method regressions (bytes):
          53 (10.19% of base) : System.Private.CoreLib.dasm - MemoryExtensions:CopyTo(ref,Span`1) (7 methods)
          41 ( 0.85% of base) : System.Text.Json.dasm - JsonSerializer:ReadCore(JsonSerializerOptions,byref,byref)
          40 ( 1.42% of base) : System.Numerics.Tensors.dasm - DenseTensor`1:CopyTo(ref,int):this (7 methods)
          39 ( 2.42% of base) : System.Text.Json.dasm - Utf8JsonReader:CheckLiteralMultiSegment(ReadOnlySpan`1,ReadOnlySpan`1,byref):bool:this
          35 ( 4.61% of base) : System.Memory.dasm - BuffersExtensions:PositionOf(byref,double):Nullable`1
          35 ( 1.05% of base) : System.Private.CoreLib.dasm - Marshal:CopyToManaged(long,ref,int,int) (7 methods)
          34 ( 3.59% of base) : System.Text.Json.dasm - JsonClassInfo:GetProperty(ReadOnlySpan`1,byref):JsonPropertyInfo:this
          32 ( 0.76% of base) : System.Memory.dasm - SequenceReader`1:TryReadTo(byref,ReadOnlySpan`1,bool):bool:this (6 methods)
          32 ( 2.47% of base) : System.Memory.dasm - SequenceReader`1:IsNext(ReadOnlySpan`1,bool):bool:this (6 methods)
          30 ( 2.92% of base) : System.Text.RegularExpressions.dasm - RegexParser:ParseReplacement(String,int,Hashtable,int,Hashtable):RegexReplacement (3 methods)
          30 ( 8.20% of base) : System.Text.RegularExpressions.dasm - ValueStringBuilder:Append(ReadOnlySpan`1):this (3 methods)
          28 ( 1.63% of base) : System.Memory.dasm - SequenceReader`1:TryReadTo(byref,Vector`1,Vector`1,bool):bool:this (2 methods)
          20 ( 0.32% of base) : System.Collections.dasm - HashSet`1:SymmetricExceptWithEnumerable(IEnumerable`1):this (7 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          16 ( 1.34% of base) : System.Net.Http.dasm - MultiProxy:TryParseProxyConfigPart(ReadOnlySpan`1,bool,byref,byref):bool
Top method improvements (bytes):
        -165 (-3.52% of base) : System.Memory.dasm - SequenceReader`1:TryReadToAnyInternal(byref,ReadOnlySpan`1,bool,int):bool:this (6 methods)
        -154 (-25.04% of base) : System.Private.CoreLib.dasm - MemoryExtensions:Overlaps(ReadOnlySpan`1,ReadOnlySpan`1):bool (7 methods)
        -129 (-0.77% of base) : System.Text.Json.dasm - <ReadAsync>d__15`1:MoveNext():this (7 methods)
        -123 (-5.52% of base) : System.Memory.dasm - SequenceReader`1:TryReadToAny(byref,ReadOnlySpan`1,bool):bool:this (12 methods)
         -87 (-3.19% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(Memory`1,ReadOnlySpan`1):Memory`1 (6 methods)
         -87 (-3.12% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(ReadOnlyMemory`1,ReadOnlySpan`1):ReadOnlyMemory`1 (6 methods)
         -79 (-8.31% of base) : System.Private.CoreLib.dasm - MemoryExtensions:Overlaps(ReadOnlySpan`1,ReadOnlySpan`1,byref):bool (7 methods)
         -63 (-5.54% of base) : System.Memory.dasm - SequenceReader`1:AdvancePastAny(ReadOnlySpan`1):long:this (6 methods)
         -63 (-4.50% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(Span`1,ReadOnlySpan`1):Span`1 (6 methods)
         -63 (-3.82% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(ReadOnlySpan`1,ReadOnlySpan`1):ReadOnlySpan`1 (7 methods)
         -62 (-19.81% of base) : System.Private.CoreLib.dasm - MemoryExtensions:SequenceEqual(Span`1,ReadOnlySpan`1):bool (6 methods)
         -62 (-19.81% of base) : System.Private.CoreLib.dasm - MemoryExtensions:SequenceEqual(ReadOnlySpan`1,ReadOnlySpan`1):bool (6 methods)
         -62 (-19.81% of base) : System.Private.CoreLib.dasm - MemoryExtensions:StartsWith(Span`1,ReadOnlySpan`1):bool (6 methods)
         -62 (-19.81% of base) : System.Private.CoreLib.dasm - MemoryExtensions:StartsWith(ReadOnlySpan`1,ReadOnlySpan`1):bool (6 methods)
         -59 (-13.56% of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|105_0(Array,Object,int,int):int (6 methods)
         -59 (-13.56% of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|111_0(Array,Object,int,int):int (6 methods)
         -59 (-12.74% of base) : System.Private.CoreLib.dasm - MemoryExtensions:ClampStart(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
         -59 (-9.44% of base) : System.Private.CoreLib.dasm - MemoryExtensions:ClampEnd(ReadOnlySpan`1,int,ReadOnlySpan`1):int (6 methods)
         -54 (-2.16% of base) : System.Net.Mail.dasm - SSPIWrapper:EncryptDecryptHelper(int,ISSPIInterface,SafeDeleteContext,Span`1,int):int
         -54 (-2.16% of base) : System.Net.Security.dasm - SSPIWrapper:EncryptDecryptHelper(int,ISSPIInterface,SafeDeleteContext,Span`1,int):int
Top method regressions (percentages):
          14 (21.88% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,Vector`1,Vector`1):int
          14 (21.88% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,Vector`1,Vector`1):int
          14 (21.88% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,Vector`1,Vector`1):int
          14 (21.88% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,Vector`1,Vector`1):int
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:Contains(Span`1,Vector`1):bool
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:Contains(ReadOnlySpan`1,Vector`1):bool
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(Span`1,Vector`1):int
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(Span`1,Vector`1):int
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(ReadOnlySpan`1,Vector`1):int
           8 (16.67% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(ReadOnlySpan`1,Vector`1):int
          53 (10.19% of base) : System.Private.CoreLib.dasm - MemoryExtensions:CopyTo(ref,Span`1) (7 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOf(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOf(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,ReadOnlySpan`1):int (6 methods)
          18 ( 9.09% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,ReadOnlySpan`1):int (6 methods)
          10 ( 8.20% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Append(ReadOnlySpan`1):this
          10 ( 8.20% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Append(ReadOnlySpan`1):this
          10 ( 8.20% of base) : System.Private.Uri.dasm - ValueStringBuilder:Append(ReadOnlySpan`1):this
Top method improvements (percentages):
         -39 (-58.21% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,long,long,long):int
         -39 (-58.21% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,long,long,long):int
         -39 (-58.21% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,long,long,long):int
         -39 (-58.21% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,long,long,long):int
         -37 (-56.92% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,int,int,int):int
         -37 (-56.92% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,int,int,int):int
         -37 (-56.92% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,int,int,int):int
         -37 (-56.92% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,int,int,int):int
         -36 (-53.73% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,double,double,double):int
         -36 (-53.73% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,double,double,double):int
         -36 (-53.73% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,double,double,double):int
         -36 (-53.73% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,double,double,double):int
         -26 (-50.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,long,long):int
         -26 (-50.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,long,long):int
         -26 (-50.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,long,long):int
         -26 (-50.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,long,long):int
         -24 (-48.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(Span`1,int,int):int
         -24 (-48.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:IndexOfAny(ReadOnlySpan`1,int,int):int
         -24 (-48.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(Span`1,int,int):int
         -24 (-48.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:LastIndexOfAny(ReadOnlySpan`1,int,int):int
421 total methods with Code Size differences (251 improved, 170 regressed), 244075 unchanged.

Sample good diff:

;;; MemoryExtensions:IndexOfAny(Span`1,long,long,long):int
;;; before
G_M64714_IG01:
       sub      rsp, 40
       mov      qword ptr [rsp+38H], rdx
       mov      qword ptr [rsp+40H], r8
       mov      qword ptr [rsp+48H], r9
						;; bbWeight=1    PerfScore 3.25
G_M64714_IG02:
       mov      rdx, bword ptr [rcx]
       mov      ecx, dword ptr [rcx+8]
						;; bbWeight=1    PerfScore 4.00
G_M64714_IG03:
       mov      r8, qword ptr [rsp+38H]
       mov      r9, qword ptr [rsp+40H]
       mov      rax, qword ptr [rsp+48H]
       mov      dword ptr [rsp+20H], ecx
       mov      rcx, rdx
       mov      rdx, r8
       mov      r8, r9
       mov      r9, rax
       call     SpanHelpers:IndexOfAny(byref,long,long,long,int):int
       nop      
						;; bbWeight=1    PerfScore 6.25
G_M64714_IG04:
       add      rsp, 40
       ret 

;;; after
G_M64714_IG01:
       sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M64714_IG02:
       mov      rax, bword ptr [rcx]
       mov      ecx, dword ptr [rcx+8]
       mov      dword ptr [rsp+20H], ecx
       mov      rcx, rax
       call     SpanHelpers:IndexOfAny(byref,long,long,long,int):int
       nop      
						;; bbWeight=1    PerfScore 6.50
G_M64714_IG03:
       add      rsp, 40
       ret 

cc @dotnet/jit-contrib

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 15, 2020
@Suchiman
Copy link
Contributor

Is dotnet/coreclr#27395 really the correct issue this solves?

@AndyAyersMS
Copy link
Member Author

Is dotnet/coreclr#27395 really the correct issue this solves?

Thanks for spotting this -- should have been dotnet/coreclr#27935

@AndyAyersMS
Copy link
Member Author

Diff on the test case from dotnet/coreclr#27935:

-; Lcl frame size = 56
+; Lcl frame size = 40

 G_M21895_IG01:
-       sub      rsp, 56
-       vzeroupper
-       xor      rax, rax
-       mov      qword ptr [rsp+28H], rax
-                                               ;; bbWeight=1    PerfScore 2.50
+       sub      rsp, 40
+                                               ;; bbWeight=1    PerfScore 0.25
 G_M21895_IG02:
-       vmovdqu  xmm0, xmmword ptr [rdx]
-       vmovdqu  xmmword ptr [rsp+28H], xmm0
-                                               ;; bbWeight=1    PerfScore 3.00
-G_M21895_IG03:
-       lea      rdx, bword ptr [rsp+28H]
        call     C:FuncAvx(int,System.Span`1[Byte])
        nop
-                                               ;; bbWeight=1    PerfScore 1.75
-G_M21895_IG04:
-       add      rsp, 56
+                                               ;; bbWeight=1    PerfScore 1.25
+G_M21895_IG03:
+       add      rsp, 40
        ret
                                                ;; bbWeight=1    PerfScore 1.25

-; Total bytes of code 40, prolog size 14, PerfScore 12.70, (MethodHash=7a51aa79) for method C:A(int,System.Span`1[Byte])
+; Total bytes of code 15, prolog size 4, PerfScore 4.25, (MethodHash=7a51aa79) for method C:A(int,System.Span`1[Byte])

@AndyAyersMS
Copy link
Member Author

Note this doesn't get the improvement in Environment:get_Is64BitOperatingSystem():bool seen in the prototype from dotnet/coreclr#27935 because we don't prune the dead code early enough, and so we still setup a pinvoke frame.

@AndyAyersMS
Copy link
Member Author

MemoryExtensions:IndexOfAny(Span`1,Vector`1,Vector`1):int gets bigger because we see fewer refs to an implicit by-ref struct (since many of them were dead refs, now gone) and so we unpromote. But it turns out promotion was profitable after all.

Seems like if an implicit byref field appears as an argument in a call we ought to give more weight to promotion, since (if unpromoted) the byref will likely be in a register that conflicts with the registers needed to make the call.

@briansull
Copy link
Contributor

briansull commented Jan 16, 2020

we see fewer refs to an implicit by-ref struct

Do you mean struct promotion or CSE promotion?

You probably mean this kind of "promoted"

    unsigned char lvPromoted : 1; // True when this local is a promoted struct, a normed struct, or a "split" long on a
                                  // 32-bit target.  For implicit byref parameters, this gets hijacked between
                                  // fgRetypeImplicitByRefArgs and fgMarkDemotedImplicitByRefArgs to indicate whether
                                  // references to the arg are being rewritten as references to a promoted shadow local.

@AndyAyersMS
Copy link
Member Author

Struct promotion.

The early trimming reduces the RCS_EARLY ref counts, and so leads to more cases where we we will undo promotion of implicit byref params. This undone promotion effectively "sinks" loads of the byref fields down to where they are consumed and keeps the byref live at those points, and this can lead to potential conflicts.

@AndyAyersMS
Copy link
Member Author

I don't see any easy way to address the regressions, and on balance this is a net win, so I suggest we go ahead with it as is....

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib anyone up for reviewing this?

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, my only question is whether the folding of the JTRUE is something that is done elsewhere and could be factored out.

@AndyAyersMS
Copy link
Member Author

my only question is whether the folding of the JTRUE is something that is done elsewhere

There is conceptually similar code in fgFoldConditional but it has a lot more invariants to maintain.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants