Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalescing calls to non-returning throw helpers? #9205

Closed
stephentoub opened this issue Oct 31, 2017 · 20 comments · Fixed by dotnet/coreclr#27113
Closed

Coalescing calls to non-returning throw helpers? #9205

stephentoub opened this issue Oct 31, 2017 · 20 comments · Fixed by dotnet/coreclr#27113
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@stephentoub
Copy link
Member

Repro:

using System;

class Program
{
    public static void Main()
    {
        var arr = new int[20];
        var s0 = new Span<int>(arr, 0, 1);
        var s1 = new Span<int>(arr, 1, 1);
        var s2 = new Span<int>(arr, 2, 1);
        var s3 = new Span<int>(arr, 3, 1);
        var s4 = new Span<int>(arr, 4, 1);
        var s5 = new Span<int>(arr, 5, 1);
    }
}

The generated asm includes this at the end:

G_M45847_IG05:
       E84B8ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()

G_M45847_IG06:
       E8468ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()

G_M45847_IG07:
       E8418ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()

G_M45847_IG08:
       E83C8ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()

G_M45847_IG09:
       E8378ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()

G_M45847_IG10:
       E8328ECC5D           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()
       CC                   int3

Can/should these be coalesced into a common call point?

category:cq
theme:basic-cq
skill-level:expert
cost:medium

@benaadams
Copy link
Member

benaadams commented Oct 31, 2017

I think it would break line numbers in pdbs? (debugging, might be more?)

@AndyAyersMS said in dotnet/coreclr#7580 (comment)

G_M55934_IG94:
       call     ThrowHelper:ThrowArgumentOutOfRange_IndexException()
G_M55934_IG95:
       call     ThrowHelper:ThrowArgumentOutOfRange_IndexException()
...

If the jit is generating native offset to IL tracking data then the repeated code sequences like the above will typically have different IL offsets and so can't be merged without losing track of which check caused an exception.

@AndyAyersMS
Copy link
Member

Too bad we can't just pass the IL offset as an argument.

Also in the example there are a lot of redundant branches. Kind of surprised that some part of the optimizer can't sort this out. For instance:

G_M35549_IG03:
       8B4108               mov      eax, dword ptr [rcx+8]
       3BC2                 cmp      eax, edx
       0F82C0000000         jb       G_M35549_IG12
       448BC8               mov      r9d, eax
       442BCA               sub      r9d, edx
       453BC8               cmp      r9d, r8d
       0F82B1000000         jb       G_M35549_IG12

G_M35549_IG04:
       4883C110             add      rcx, 16
       4C63D2               movsxd   r10, edx
       4A8D0C91             lea      rcx, bword ptr [rcx+4*r10]
       3BC2                 cmp      eax, edx    // same compare as in the block above
       0F82A3000000         jb       G_M35549_IG13
       453BC8               cmp      r9d, r8d    // ditto
       0F829A000000         jb       G_M35549_IG13

G_M35549_IG05:
       4C8BD1               mov      r10, rcx
       458BD8               mov      r11d, r8d
       3BC2                 cmp      eax, edx    // ditto
       0F8291000000         jb       G_M35549_IG14
       453BC8               cmp      r9d, r8d    // ditto
       0F8288000000         jb       G_M35549_IG14

@AndyAyersMS
Copy link
Member

Actually I should say that when the jit is optimizing it generally won't take pains to maintain unique IL offsets. So coalescing those calls would be something we'd consider to be fair game. And something like this already happens now for jit-induced throws, e.g. array bounds checks, and also with the recently introduced constant return value merging.

However generalizing this to user-provided throws the jit would need to use a potentially jit-time expensive match to coalesce blocks.

So since this particular optimization is potentially expensive and mainly saves code size (and here, size in the "cold" section) it typically doesn't stack up well versus other opportunities.

@mikedn
Copy link
Contributor

mikedn commented Nov 1, 2017

However generalizing this to user-provided throws the jit would need to use a potentially jit-time expensive match to coalesce blocks.

In theory this should be easy, those blocks are produced during inlining by importing the same block from the same method. So they should be identical. But in practice it's probably "yeah, good luck with guaranteeing that the importer generates identical IR from the same IL block" unfortunately.

@mikedn
Copy link
Contributor

mikedn commented Nov 1, 2017

Also in the example there are a lot of redundant branches. Kind of surprised that some part of the optimizer can't sort this out. For instance:

Not sure how you got that code. When I try this the result is even funnier:

G_M55922_IG02:
       48B9A88B6778F97F0000 mov      rcx, 0x7FF978678BA8
       BA14000000           mov      edx, 20
       E8C814835F           call     CORINFO_HELP_NEWARR_1_VC
       33C0                 xor      eax, eax
       83F814               cmp      eax, 20
       7741                 ja       SHORT G_M55922_IG05
       B801000000           mov      eax, 1
       83F814               cmp      eax, 20
       7737                 ja       SHORT G_M55922_IG05

G_M55922_IG03:
       B801000000           mov      eax, 1
       83F814               cmp      eax, 20
       7732                 ja       SHORT G_M55922_IG06
       B802000000           mov      eax, 2
       83F814               cmp      eax, 20
       772D                 ja       SHORT G_M55922_IG07
       B803000000           mov      eax, 3
       83F814               cmp      eax, 20
       7728                 ja       SHORT G_M55922_IG08
       B804000000           mov      eax, 4
       83F814               cmp      eax, 20
       7723                 ja       SHORT G_M55922_IG09
       B805000000           mov      eax, 5
       83F814               cmp      eax, 20
       771E                 ja       SHORT G_M55922_IG10

I'll take a look, it's just too lame to ignore this :)

@mikedn
Copy link
Contributor

mikedn commented Nov 1, 2017

There are probably more ways to look at this but one way goes like this:

  • Morph has fgFoldConditional that can eliminate such branches
  • It's called during global morph but at that point relop operands aren't all constant.
  • EarlyProp propagates the constant array length but doesn't attempt to morph or otherwise optimize the resulting trees (though recently I changed it to eliminate useless range checks)
  • AssertionProp does some constant propagation and then it also morphs the resulting trees. However, it only morphs if it did propagate something. In this case it does not do anything because EarlyProp already propagated constants.
  • Apparently there's nothing else that morph those trees after AssertionProp.

If AssertionProp is changed to always morph, even if it did not propagate anything, then the generated code reduces to:

G_M55922_IG01:
       4883EC28             sub      rsp, 40
G_M55922_IG02:
       48B9A88B6778F97F0000 mov      rcx, 0x7FF978678BA8
       BA14000000           mov      edx, 20
       E8C814845F           call     CORINFO_HELP_NEWARR_1_VC
       90                   nop
G_M55922_IG03:
       4883C428             add      rsp, 40
       C3                   ret
; Total bytes of code 30, prolog size 4 for method Program:Test()

@AndyAyersMS
Copy link
Member

@mikedn I was looking at a slightly different example case:

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static int Test1(int[] arr, int offset, int count)
    {
        Span<int> s1 = new Span<int>(arr, offset, count);
        Span<int> s2 = new Span<int>(arr, offset, count);
        Span<int> s3 = new Span<int>(arr, offset, count);
        Span<int> s4 = new Span<int>(arr, offset, count);
        Span<int> s5 = new Span<int>(arr, offset, count);
        Span<int> s6 = new Span<int>(arr, offset, count);
        return s1[1] + s2[2] + s3[3] + s4[4] + s5[5] + s6[6];
    }

@mikedn
Copy link
Contributor

mikedn commented Nov 1, 2017

I was looking at a slightly different example case:

Right, assertion prop could likely catch these but currently it only handles EQ and NE:
https://github.com/dotnet/coreclr/blob/dc5d41fd993d29c68be304f1bfefe38e7e61e7bd/src/jit/assertionprop.cpp#L1923-L1936

Hmm, maybe I'll play a bit with it. I'm curious what CQ improvements handling additional relops could provide and at what cost.

@AndyAyersMS
Copy link
Member

Wonder how hard it would be to change assertion prop from a forward dense to a backward sparse or on demand approach, especially given that we have SSA and that relatively few operations benefit from or generate assertions.

Trying to guess which facts might matter later seems inevitably suboptimal.

@mikedn
Copy link
Contributor

mikedn commented Nov 2, 2017

Wonder how hard it would be to change assertion prop from a forward dense to a backward sparse or on demand approach, especially given that we have SSA and that relatively few operations benefit from or generate assertions.

I was wondering the same thing recently, but from the perspective of RangeCheck. That one is mostly demand driven (it does have a "build SSA def map" pre-pass but that's actually unnecessary) but it relies on AssertionProp doing a lot of work upfront and every time it merges assertions it has to scan all live assertions looking for the one(s) that are suitable for merging. And after all this work RangeCheck only manages to eliminate 5% of the (corelib) range checks that pass through it.

Attempting to redo AssertionProp is likely too much for me, even as an experiment. But I think I could experiment a bit with having RangeCheck compute its own assertions on demand.

@mikedn
Copy link
Contributor

mikedn commented Nov 4, 2017

Yeah, a quick attempt at removing redundant if (x < y) had pretty much the expected result - a few hundred bytes of improvements and a few hundreds bytes of regressions.

Total bytes of diff: -70 (0.00% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file regressions by size (bytes):
         155 : System.Reflection.Metadata.dasm (0.20% of base)
          41 : System.IO.FileSystem.Watcher.dasm (0.31% of base)

Top file improvements by size (bytes):
        -142 : System.Private.CoreLib.dasm (0.00% of base)
         -57 : System.Linq.Expressions.dasm (-0.01% of base)
         -53 : System.Runtime.Serialization.Formatters.dasm (-0.06% of base)
         -10 : System.Private.Xml.dasm (0.00% of base)
          -4 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)

7 total files with size differences (5 improved, 2 regressed), 123 unchanged.

Top method regessions by size (bytes):
         155 : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
          41 : System.IO.FileSystem.Watcher.dasm - PatternMatcher:StrictMatchPattern(ref,ref):bool

Top method improvements by size (bytes):
        -130 : System.Private.CoreLib.dasm - PathHelper:TryExpandShortFileName(byref,ref):ref
         -57 : System.Linq.Expressions.dasm - EnterTryFaultInstruction:Run(ref):int:this
         -30 : System.Runtime.Serialization.Formatters.dasm - BinaryParser:ReadArrayAsBytes(ref):this
         -23 : System.Runtime.Serialization.Formatters.dasm - BinaryFormatterWriter:WriteArrayAsBytes(ref,int):this
         -12 : System.Private.CoreLib.dasm - Path:GetRelativePath(ref,ref,int):ref

The regression from InitializeTableReaders seems to be caused by range checks that are no longer eliminated because the assertion table filled up and the assertions needed by RangeCheck are no longer present. Something similar happens in StrictMatchPattern too but there's also a problem with register allocation (possible indirectly caused by range check elimination failure).

The TryExpandShortFileName improvement come from this line:

if (inputBuffer[foundIndex] == '\0') inputBuffer[foundIndex] = '\\';

inputBuffer is a StringBuffer and its indexer is trivial and gets inlined. The second check of the index argument is redundant and it gets eliminated, together with the associated throw block. It's very similar to the case of the Span constructor above.

@JosephTremoulet
Copy link
Contributor

We got better about optimizing the bounds checks from the span indexer by making it an intrinsic, but we didn't do anything about the bounds checks in Slice or this one in the ctor. We could consider adding void System.Runtime.CompilerServices.BoundsCheck(int index, int length) or somesuch, that does the compare and conditional throw, marking it [Intrinsic] and importing it directly to GT_BOUNDS_CHECK. Then maybe adjusting the various Span methods to call it instead of have their own tests and throws would get these specific cases without redoing range propagation generally.

I think we have some places that can only optimize GT_BOUNDS_CHECK when it is the LHS of a GT_COMMA, so maybe that would get in the way here...

@benaadams
Copy link
Member

Revisiting this; accessing multiple arrays in different ways get away with a single call CORINFO_HELP_RNGCHKFAIL how does that work, if coalescing the call System.ThrowHelper:ThrowArgumentOutOfRangeException() wouldn't?

@mikedn
Copy link
Contributor

mikedn commented Oct 4, 2019

The CORINFO_HELP_RNGCHKFAIL call is internally generated by the JIT so it simply generates only one.

@AndyAyersMS
Copy link
Member

It might be interesting to do a crude prototype (that might not have all the needed safety checks) to get a read on how much code size we could save. In the jit it would be nice to do this early to cut down on IR volume, so run say just after inlining. Doesn't have to be a generalized tail merge, can just special case throw blocks and literally just pattern match the simplest cases.

If that looks promising, this would primarily be a size-on-disk / jit TP optimization.

There are some wins at runtime from keeping cold code compact.. We're talking about the cost of partial hot/cold cache lines and hot/cold code pages. The impact of these can be subtle and I have seen some indications that in realistic apps we are losing more perf than one might expect from poor cache/tlb density. And this could also help cold startup some.

So again this might be worth a look if we see decent amount of compression.

A more robust fix for runtime impact would be to implement hot/cold splitting as this would let us sequester larger sequences of cold code (say the voluminous exception object construction one sees w/o helpers).

Splitting was supported by fragile NGEN but is not supported for R2R or jitted code (and is/was not yet enabled for all ISAs/OSs). I think this is mainly a runtime/R2R issue and the jit has most of the logic it needs already.

@AndyAyersMS
Copy link
Member

Here's a prototype SimpleThrowTailMerge that can merge many common throw helper blocks.

Jit-Diffs reports:

Total bytes of diff: -39926 (-0.10% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7162 : System.Private.CoreLib.dasm (-0.16% of base)
       -4440 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4345 : System.Memory.dasm (-1.87% of base)
       -2488 : CommandLine.dasm (-0.56% of base)
       -2015 : Microsoft.CodeAnalysis.dasm (-0.12% of base)

82 total files with size differences (82 improved, 0 regressed), 47 unchanged.

Top method regressions by size (bytes):
         111 ( 5.69% of base) : System.Memory.dasm - SequenceReader`1:TryReadToSlow(byref,struct,struct,int,bool):bool:this (2 methods)
          87 (14.19% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Pbes2Decrypt(struct,struct,struct,struct,struct):int
          87 (14.19% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Pbes2Decrypt(struct,struct,struct,struct,struct):int
          78 (15.82% of base) : System.Memory.dasm - SequenceReader`1:TryReadToSlow(byref,struct,bool):bool:this
          56 ( 5.15% of base) : System.Security.Cryptography.Algorithms.dasm - KeyFormatHelper:WriteEncryptedPkcs8(struct,struct,ref,ref):ref

Top method improvements by size (bytes):
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -390 (-6.95% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)
        -294 (-1.80% of base) : System.Linq.Parallel.dasm - ParallelQuery`1:OfType():ref:this (49 methods)

Top method regressions by size (percentage):
          78 (15.82% of base) : System.Memory.dasm - SequenceReader`1:TryReadToSlow(byref,struct,bool):bool:this
          87 (14.19% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Pbes2Decrypt(struct,struct,struct,struct,struct):int
          87 (14.19% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Pbes2Decrypt(struct,struct,struct,struct,struct):int
          12 ( 9.52% of base) : System.Text.RegularExpressions.dasm - <>c:<AddConcatenate>b__78_0(struct,struct):this
          27 ( 8.79% of base) : System.Net.Http.dasm - Http2ReadStream:ReadAsync(struct,struct):struct:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -45 (-16.25% of base) : System.Private.CoreLib.dasm - Decimal:ToDecimal(struct):struct
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this
         -14 (-14.89% of base) : System.Security.Cryptography.Encoding.dasm - ThrowHelper:ValidateTransformBlock(ref,int,int,int)

1781 total methods with size differences (1756 improved, 25 regressed), 201163 unchanged.

The size win is modest, though still perhaps worth pursuing.

  • We might consider handling more general cases like BBJ_THROW blocks -- not clear how often users write the exact same throw constructs in a method or across an inline complex.
  • Probably should to generalize the map key to include EH Region.
  • I haven't looked at diffs extensively yet. Might be missing more cases than I realize.
  • Don't expect regressions so will have to drill into those.
  • Might want to move this transformation earlier in the phase order to remove IR sooner.

@AndyAyersMS
Copy link
Member

For regressions: suspect the block commoning, plus the fact that we don't model the fact that noreturn call blocks have no successor flow until we've morphed the calls, ends up spoofing fgMightHaveLoop which gets the wrong idea, and so we lose an outgoing arg struct copy optimization in fgMakeOutgoingStructArgCopy.

I suppose we should introduce some variant of BBJ_THROW for the noreturn call case, but knowledge of the possible jump kinds is scattered all over. So maybe just an new BBF instead (or in addition)... ?

@AndyAyersMS
Copy link
Member

Fixed my prototype to handle EH better, and to avoid most problematic flow messes by choosing canonical examples starting from the end of the block list, rather than the start.

Updated diffs:

Total bytes of diff: -43903 (-0.11% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7399 : System.Private.CoreLib.dasm (-0.17% of base)
       -4746 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4541 : System.Memory.dasm (-1.95% of base)
       -2464 : CommandLine.dasm (-0.55% of base)
       -2131 : Microsoft.CodeAnalysis.dasm (-0.12% of base)

82 total files with size differences (82 improved, 0 regressed), 47 unchanged.

Top method regressions by size (bytes):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          13 ( 0.07% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(ref,ref,int,byref,struct,struct,bool,bool,bool,bool,ref,ref,bool,ref,ref):this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this

Top method improvements by size (bytes):
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -386 (-6.88% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)
        -294 (-1.80% of base) : System.Linq.Parallel.dasm - ParallelQuery`1:OfType():ref:this (49 methods)

Top method regressions by size (percentage):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
           3 ( 0.33% of base) : xunit.core.dasm - MemberDataAttributeBase:GetData(ref):ref:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this
         -11 (-14.86% of base) : System.Diagnostics.Process.dasm - NtProcessManager:ReadCounterValue(int,struct):long
         -41 (-14.80% of base) : System.Private.CoreLib.dasm - Decimal:ToDecimal(struct):struct

1787 total methods with size differences (1781 improved, 6 regressed), 201157 unchanged.

@AndyAyersMS
Copy link
Member

Tried extending this to BBJ_THROW in ExtendThrowTailMergeToThrows, with various search limits (eg cutoff search after 20 stmts). Did not find much in the way of additional merge opportunities. Delta from above was:

Total bytes of diff: -1104 (0.00% of base)
    diff is an improvement.

Top file improvements by size (bytes):
        -490 : System.Data.Common.dasm (-0.03% of base)
        -221 : Newtonsoft.Json.dasm (-0.03% of base)
        -177 : System.Private.Xml.dasm (0.00% of base)
         -88 : Microsoft.CSharp.dasm (-0.03% of base)
         -40 : System.Net.Requests.dasm (-0.04% of base)

11 total files with size differences (11 improved, 0 regressed), 118 unchanged.

Top method improvements by size (bytes):
        -155 (-7.06% of base) : System.Data.Common.dasm - DbConnectionOptions:GetKeyValuePair(ref,int,ref,bool,byref,byref):int
        -105 (-12.54% of base) : System.Data.Common.dasm - ExpressionParser:ParseAggregateArgument(int):ref:this
         -57 (-3.11% of base) : System.Private.Xml.dasm - XmlSerializationReader:GetPrimitiveType(ref,bool):ref:this
         -47 (-0.83% of base) : Newtonsoft.Json.dasm - <ReadStringValueAsync>d__37:MoveNext():this
         -47 (-4.05% of base) : System.Data.Common.dasm - ExpressionParser:Scan():int:this

Top method improvements by size (percentage):
        -105 (-12.54% of base) : System.Data.Common.dasm - ExpressionParser:ParseAggregateArgument(int):ref:this
         -17 (-12.41% of base) : System.Data.Common.dasm - NameNode:Eval(ref,int):ref:this
         -29 (-7.73% of base) : System.Data.Common.dasm - NameNode:Bind(ref,ref):this
         -21 (-7.27% of base) : System.Data.Common.dasm - DataColumn:set_DateTimeMode(int):this
         -21 (-7.27% of base) : System.Data.Common.dasm - FunctionNode:Check():this

43 total methods with size differences (43 improved, 0 regressed), 202901 unchanged.

so as I had suspected, we do not see duplicate throws very often.

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Oct 9, 2019
Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.
AndyAyersMS referenced this issue in AndyAyersMS/coreclr Oct 15, 2019
Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.
AndyAyersMS referenced this issue in dotnet/coreclr Oct 30, 2019
Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.

This phase runs just before `optOptimizeFlow`, so that we can leverage
the ref counts and predecessor lists to ensure we make correct flow updates.

It doesn't bother trying to clean out IR, that happens naturally as blocks
become unreferenced.

In some cases nothrow helpers end up being tail call candidates. We now suppress
tail calling noreturn methods if there is more than one such call site in the method,
hoping that instead we can merge the calls.
@stephentoub
Copy link
Member Author

Thanks, @AndyAyersMS.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants