JIT: add pass to merge common throw helper calls #27113

AndyAyersMS · 2019-10-09T22:31:01Z

Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.

AndyAyersMS · 2019-10-09T22:32:42Z

@dotnet/jit-contrib PTAL

jit-diffs shows this hits modestly often...

Total bytes of diff: -43903 (-0.11% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7399 : System.Private.CoreLib.dasm (-0.17% of base)
       -4746 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4541 : System.Memory.dasm (-1.95% of base)
       -2464 : CommandLine.dasm (-0.55% of base)
       -2131 : Microsoft.CodeAnalysis.dasm (-0.12% of base)

82 total files with size differences (82 improved, 0 regressed), 47 unchanged.

Top method regressions by size (bytes):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          13 ( 0.07% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(ref,ref,int,byref,struct,struct,bool,bool,bool,bool,ref,ref,bool,ref,ref):this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this

Top method improvements by size (bytes):
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -386 (-6.88% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)
        -294 (-1.80% of base) : System.Linq.Parallel.dasm - ParallelQuery`1:OfType():ref:this (49 methods)

Top method regressions by size (percentage):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
           3 ( 0.33% of base) : xunit.core.dasm - MemberDataAttributeBase:GetData(ref):ref:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this
         -11 (-14.86% of base) : System.Diagnostics.Process.dasm - NtProcessManager:ReadCounterValue(int,struct):long
         -41 (-14.80% of base) : System.Private.CoreLib.dasm - Decimal:ToDecimal(struct):struct

1787 total methods with size differences (1781 improved, 6 regressed), 201157 unchanged.

erozenfeld

LGTM
What's the throughput impact of this optimization?

BruceForstall

Why do you see regressions?

src/jit/flowgraph.cpp

src/jit/morph.cpp

src/jit/flowgraph.cpp

mikedn · 2019-10-10T05:52:45Z

src/jit/flowgraph.cpp

+    public:
+        static bool Equals(const ThrowHelperKey& x, const ThrowHelperKey& y)
+        {
+            return BasicBlock::sameEHRegion(x.m_block, y.m_block) && GenTreeCall::Equals(x.m_call, y.m_call);


Beware that the GenTree equality check ignores gtCallMoreFlags, most of gtFlags and has at least one bug (I ran into it a few days ago - it ignores the size of GT_BLK nodes). I'm going to fix the bug soon but it also turns out that GenTree::Compare is not widely used in the JIT so it's possible to run into more issues when a new use is added.

And if you were perhaps considering to use gtHashValue to deal with the more general case - that one also has some issues and it's only used by debug functionality.

Good point. Hadn't really looked closely at the compare code.

For the more general case where we try and merge throws (see linked issue) I wasn't relying on tree hashes.

AndyAyersMS · 2019-10-10T19:57:52Z

The flow updates in this PR aren't sufficiently general yet. We need to ensure that the throw helper blocks can't be reached both by fall through and jump or else handle that case properly. For instance in TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this we see flow like:

------------ BB16 [005..006) -> BB18 (cond), preds={} succs={BB17,BB18}

***** BB16
STMT00041 (IL 0x005...  ???)
               [000175] ------------              *  JTRUE     void  
               [000174] ------------              \--*  NE        int   
               [000172] ------------                 +--*  LCL_VAR   int    V03 arg3         
               [000173] ------------                 \--*  CNS_INT   int    0

------------ BB17 [005..006), preds={} succs={BB18}

------------ BB18 [005..006), preds={} succs={BB19}

***** BB18
STMT00042 (IL 0x005...  ???)
               [000177] I-C-G-------              *  CALL      void   ThrowHelper.ThrowArgumentOutOfRangeException (exactContextHnd=0x00000000D1FFAB1E)
               [000176] ------------ arg0         \--*  CNS_INT   int    33

where BB18 is non-canonical.

Need to think more about how to fix flow this early in the jit without requiring full pred list info (computing even cheap preds is probably more costly than this entire phase -- I think we can instead leverage the jump target bb flag we set up early).

A couple other things to sort through:

We really should clean out the trees in non-canonical jumped-to throw helper blocks, otherwise morph still processes the calls, even though the blocks are unreachable.
No block can be cleaned unless we're sure all predecessors have been rerouted, so we probably need to handle more predecessor types, just in case someone gets creative with switches and gotos. Will need to work up some test cases.
Might as well flag when we introduce no return calls during importation, so we can avoid running this phase when there's nothing to find, which is most of the time (currently only ~1% or so of methods have these throw helper calls).

Back to the drawing board, for now.

AndyAyersMS · 2019-10-11T16:08:22Z

Hmm, seemingly no good way to fix the flow graph that early. We don't propagate branch target bits to blocks during initial flow graph construction, we don't have good discipline maintaining these flags during early edits, ref counts are not reliable early (among other things we leave excess counts for partially imported methods), and so on.

So it seems like this transformation must be done later, once ref counts are reliable and we have pred lists; that cuts out some of the (admittedly small) potentential TP win, but we'll should still get the code size wins.

AndyAyersMS · 2019-10-11T22:27:25Z

Moving it later in the phase list looks promising (now have it just before optOptimizeFlow). Flow updates now viable. Code size savings is now 48659 (up from 43903 above) as we can handle a wider variety of cases.

Speaking of variety, one odd case that's come up is that we may decide to tail call a throw helper. My merging code doesn't currently cope with this case.

When that happens we don't mark the block as rare and it doesn't get moved out of line like a called throw helper does. So we can end up with notreturn blocks in the middle of loops and such. We could probably fix that part easily enough.

But aside from that, it seems likes we should either never tail call throw helpers (if we tail call, we lose info about which method may have really caused the throw) or almost always tail call them (when we can). Never tail calling them allows even more merging but also a number of regressions in methods where merging isn't possible. Hmmm...

AndyAyersMS · 2019-10-12T17:54:53Z

Running via pin on crossgen of SPC, throughput impact is basically zero:

baseline: 17,104,524,933
diff:     17,102,043,059

AndyAyersMS · 2019-10-12T18:01:04Z

Going to reopen this -- main outstanding issue is how much we can trust tree comparison. The framework throw helper calls tend to have very simple argument trees, but that may not be the case elsewhere.

Added a simple heuristic to not tail call throw helpers if there's more than one such call in a method, and instead hope merging kicks in.

Still need to add some test cases for odd flow patterns.

Latest jit-diffs:

Total bytes of diff: -51919 (-0.13% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7639 : System.Private.CoreLib.dasm (-0.17% of base)
       -4829 : System.Memory.dasm (-2.08% of base)
       -4749 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4593 : System.Collections.Immutable.dasm (-0.42% of base)
       -2930 : System.Private.Xml.dasm (-0.08% of base)

83 total files with size differences (83 improved, 0 regressed), 46 unchanged.

Top method regressions by size (bytes):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          13 ( 0.07% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(ref,ref,int,byref,struct,struct,bool,bool,bool,bool,ref,ref,bool,ref,ref):this
           4 ( 3.48% of base) : Microsoft.CodeAnalysis.dasm - ObjectReader:CreateInstance(ref):ref:this

Top method improvements by size (bytes):
        -588 (-4.60% of base) : System.Collections.Immutable.dasm - Node:CopyTo(ref,int):this (28 methods)
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -386 (-6.88% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)

Top method regressions by size (percentage):
           4 ( 3.48% of base) : Microsoft.CodeAnalysis.dasm - ObjectReader:CreateInstance(ref):ref:this
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
           3 ( 0.33% of base) : xunit.core.dasm - MemberDataAttributeBase:GetData(ref):ref:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -147 (-16.67% of base) : System.Collections.Immutable.dasm - ImmutableList`1:Sort(int,int,ref):ref:this (7 methods)
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -19 (-16.24% of base) : System.Net.Http.dasm - WinHttpHandler:SetSessionHandleConnectionOptions(ref):this
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this

2643 total methods with size differences (2637 improved, 6 regressed), 200301 unchanged.

AndyAyersMS · 2019-10-15T00:20:44Z

Going to bounce this to trigger retest.

AndyAyersMS · 2019-10-15T07:53:03Z

Ah, looks like I need to rebase...

Look for blocks with single statement noreturn calls, and try to reroute flow so there's just one block call that all predecessors target. Resolves #14770. Note this impairs debuggability of optimized code a bit, as it can change which line of code apparently invokes a throw helper in a backtrace. But since we're already commoning jit-inserted throw helpers (like array index OOB) this is not breaking any new ground. We could also handle commoning BBJ_THROW blocks, with some extra effort, but prototyping indicates duplicate throws are pretty rare.

* Unify helper classes * Stack allocate map headers * Phasify * Implement an earlyout scheme

Revise this phase to run just before `optOptimizeFlow`, so that we can leverage the ref counts and predecessor lists to ensure we make correct flow updates. Don't bother trying to clean out IR, let that happen naturally from as blocks become unreferenced.

Suppress tail calling noreturn methods if there is more than one such call site in the method, hoping that instead we can merge the calls.

CarolEidt

I have mostly questions.

src/jit/compiler.h

src/jit/flowgraph.cpp

CarolEidt

LGTM - thanks for the explanations

AndyAyersMS · 2019-10-16T15:40:41Z

Looking at GenTreeCall::Compare, there are quite a few fields that don't factor in:

fgArgInfo
regArgListCount
regArgList
callSig
gtReturnTypeDesc
gtOtherRegs
gtSpillFlags
gtFlags
gtCallMoreFlags
gtReturnType
... the big union ...
gtEntryPoint (partial)

Not sure what to do about this yet. Some of the information is partially redundant so current comparisons may be sufficient (eg return type will be determined by the call target).

src/jit/flowgraph.cpp

AndyAyersMS · 2019-10-16T18:38:00Z

Diff summary for the test case methods. Note not using tail calls for throw helpers hurts code size slightly, in some cases.

PMI Diffs for C:\repos\coreclr3\tests\src\JIT\opt\ThrowHelper\ThrowHelper.exe for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -65 (-1.82% of base)
    diff is an improvement.
Top file improvements by size (bytes):
         -65 : ThrowHelper.dasm (-1.82% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions by size (bytes):
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchTwoTail(int)
Top method improvements by size (bytes):
         -18 (-38.30% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_IfOneTail(int)
         -15 (-26.79% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpersSameArgTrees(int,ref):int
         -12 (-27.91% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_If(int):int
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoOneTail(int)
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchGotoOneTail(int)
Top method regressions by size (percentage):
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchTwoTail(int)
Top method improvements by size (percentage):
         -18 (-38.30% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_IfOneTail(int)
         -12 (-27.91% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_If(int):int
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoOneTail(int)
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchGotoOneTail(int)
         -15 (-26.79% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpersSameArgTrees(int,ref):int
17 total methods with size differences (11 improved, 6 regressed), 21 unchanged.

AndyAyersMS · 2019-10-25T02:26:32Z

@BruceForstall want to take another look?

There is still some concern about the potential for bugs in tree comparison (mainly false matches from insufficiently detailed compares). If we think this is a deal breaker I'll probably shelve this and try and address that separately.

BruceForstall

A few nits, but LGTM

src/jit/compphases.h

src/jit/flowgraph.cpp

BruceForstall · 2019-10-26T00:05:17Z

src/jit/flowgraph.cpp

+    if (updateCount > 0)
+    {
+        assert(fgModified);
+        fgModified = false;


This seems a little worrisome, but ok... Wouldn't (Shouldn't) the first code to generate "flow-dependent side data" reset this?

Similar to logic in fgCreateFuncletPrologBlocks.

The fgModified concept should probably be updated to just directly invalidate the dependent analyses, if any (in this case and the funclet case, there are none).

AndyAyersMS · 2020-04-21T18:56:41Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: add pass to merge common throw helper calls #27113

JIT: add pass to merge common throw helper calls #27113

AndyAyersMS commented Oct 9, 2019

AndyAyersMS commented Oct 9, 2019

erozenfeld left a comment

BruceForstall left a comment

mikedn Oct 10, 2019

mikedn Oct 10, 2019

AndyAyersMS Oct 10, 2019

AndyAyersMS commented Oct 10, 2019

AndyAyersMS commented Oct 11, 2019

AndyAyersMS commented Oct 11, 2019

AndyAyersMS commented Oct 12, 2019

AndyAyersMS commented Oct 12, 2019

AndyAyersMS commented Oct 15, 2019

AndyAyersMS commented Oct 15, 2019

CarolEidt left a comment

CarolEidt left a comment

AndyAyersMS commented Oct 16, 2019

AndyAyersMS commented Oct 16, 2019

AndyAyersMS commented Oct 25, 2019

BruceForstall left a comment

BruceForstall Oct 26, 2019

AndyAyersMS Oct 29, 2019

AndyAyersMS commented Apr 21, 2020

JIT: add pass to merge common throw helper calls #27113

JIT: add pass to merge common throw helper calls #27113

Conversation

AndyAyersMS commented Oct 9, 2019

AndyAyersMS commented Oct 9, 2019

erozenfeld left a comment

Choose a reason for hiding this comment

BruceForstall left a comment

Choose a reason for hiding this comment

mikedn Oct 10, 2019

Choose a reason for hiding this comment

mikedn Oct 10, 2019

Choose a reason for hiding this comment

AndyAyersMS Oct 10, 2019

Choose a reason for hiding this comment

AndyAyersMS commented Oct 10, 2019

AndyAyersMS commented Oct 11, 2019

AndyAyersMS commented Oct 11, 2019

AndyAyersMS commented Oct 12, 2019

AndyAyersMS commented Oct 12, 2019

AndyAyersMS commented Oct 15, 2019

AndyAyersMS commented Oct 15, 2019

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Oct 16, 2019

AndyAyersMS commented Oct 16, 2019

AndyAyersMS commented Oct 25, 2019

BruceForstall left a comment

Choose a reason for hiding this comment

BruceForstall Oct 26, 2019

Choose a reason for hiding this comment

AndyAyersMS Oct 29, 2019

Choose a reason for hiding this comment

AndyAyersMS commented Apr 21, 2020