Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: add pass to merge common throw helper calls #27113

Merged
merged 9 commits into from
Oct 30, 2019

Conversation

AndyAyersMS
Copy link
Member

Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib PTAL

jit-diffs shows this hits modestly often...

Total bytes of diff: -43903 (-0.11% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7399 : System.Private.CoreLib.dasm (-0.17% of base)
       -4746 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4541 : System.Memory.dasm (-1.95% of base)
       -2464 : CommandLine.dasm (-0.55% of base)
       -2131 : Microsoft.CodeAnalysis.dasm (-0.12% of base)

82 total files with size differences (82 improved, 0 regressed), 47 unchanged.

Top method regressions by size (bytes):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          13 ( 0.07% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(ref,ref,int,byref,struct,struct,bool,bool,bool,bool,ref,ref,bool,ref,ref):this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this

Top method improvements by size (bytes):
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -386 (-6.88% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)
        -294 (-1.80% of base) : System.Linq.Parallel.dasm - ParallelQuery`1:OfType():ref:this (49 methods)

Top method regressions by size (percentage):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
           4 ( 1.04% of base) : System.Private.CoreLib.dasm - TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
           3 ( 0.33% of base) : xunit.core.dasm - MemberDataAttributeBase:GetData(ref):ref:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this
         -11 (-14.86% of base) : System.Diagnostics.Process.dasm - NtProcessManager:ReadCounterValue(int,struct):long
         -41 (-14.80% of base) : System.Private.CoreLib.dasm - Decimal:ToDecimal(struct):struct

1787 total methods with size differences (1781 improved, 6 regressed), 201157 unchanged.

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
What's the throughput impact of this optimization?

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you see regressions?

src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
public:
static bool Equals(const ThrowHelperKey& x, const ThrowHelperKey& y)
{
return BasicBlock::sameEHRegion(x.m_block, y.m_block) && GenTreeCall::Equals(x.m_call, y.m_call);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware that the GenTree equality check ignores gtCallMoreFlags, most of gtFlags and has at least one bug (I ran into it a few days ago - it ignores the size of GT_BLK nodes). I'm going to fix the bug soon but it also turns out that GenTree::Compare is not widely used in the JIT so it's possible to run into more issues when a new use is added.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you were perhaps considering to use gtHashValue to deal with the more general case - that one also has some issues and it's only used by debug functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Hadn't really looked closely at the compare code.

For the more general case where we try and merge throws (see linked issue) I wasn't relying on tree hashes.

@AndyAyersMS
Copy link
Member Author

The flow updates in this PR aren't sufficiently general yet. We need to ensure that the throw helper blocks can't be reached both by fall through and jump or else handle that case properly. For instance in TextInfo:AddTitlecaseLetter(byref,byref,int,int):int:this we see flow like:

------------ BB16 [005..006) -> BB18 (cond), preds={} succs={BB17,BB18}

***** BB16
STMT00041 (IL 0x005...  ???)
               [000175] ------------              *  JTRUE     void  
               [000174] ------------              \--*  NE        int   
               [000172] ------------                 +--*  LCL_VAR   int    V03 arg3         
               [000173] ------------                 \--*  CNS_INT   int    0

------------ BB17 [005..006), preds={} succs={BB18}

------------ BB18 [005..006), preds={} succs={BB19}

***** BB18
STMT00042 (IL 0x005...  ???)
               [000177] I-C-G-------              *  CALL      void   ThrowHelper.ThrowArgumentOutOfRangeException (exactContextHnd=0x00000000D1FFAB1E)
               [000176] ------------ arg0         \--*  CNS_INT   int    33

where BB18 is non-canonical.

Need to think more about how to fix flow this early in the jit without requiring full pred list info (computing even cheap preds is probably more costly than this entire phase -- I think we can instead leverage the jump target bb flag we set up early).

A couple other things to sort through:

  • We really should clean out the trees in non-canonical jumped-to throw helper blocks, otherwise morph still processes the calls, even though the blocks are unreachable.
  • No block can be cleaned unless we're sure all predecessors have been rerouted, so we probably need to handle more predecessor types, just in case someone gets creative with switches and gotos. Will need to work up some test cases.
  • Might as well flag when we introduce no return calls during importation, so we can avoid running this phase when there's nothing to find, which is most of the time (currently only ~1% or so of methods have these throw helper calls).

Back to the drawing board, for now.

@AndyAyersMS
Copy link
Member Author

Hmm, seemingly no good way to fix the flow graph that early. We don't propagate branch target bits to blocks during initial flow graph construction, we don't have good discipline maintaining these flags during early edits, ref counts are not reliable early (among other things we leave excess counts for partially imported methods), and so on.

So it seems like this transformation must be done later, once ref counts are reliable and we have pred lists; that cuts out some of the (admittedly small) potentential TP win, but we'll should still get the code size wins.

@AndyAyersMS
Copy link
Member Author

Moving it later in the phase list looks promising (now have it just before optOptimizeFlow). Flow updates now viable. Code size savings is now 48659 (up from 43903 above) as we can handle a wider variety of cases.

Speaking of variety, one odd case that's come up is that we may decide to tail call a throw helper. My merging code doesn't currently cope with this case.

When that happens we don't mark the block as rare and it doesn't get moved out of line like a called throw helper does. So we can end up with notreturn blocks in the middle of loops and such. We could probably fix that part easily enough.

But aside from that, it seems likes we should either never tail call throw helpers (if we tail call, we lose info about which method may have really caused the throw) or almost always tail call them (when we can). Never tail calling them allows even more merging but also a number of regressions in methods where merging isn't possible. Hmmm...

@AndyAyersMS
Copy link
Member Author

Running via pin on crossgen of SPC, throughput impact is basically zero:

baseline: 17,104,524,933
diff:     17,102,043,059

@AndyAyersMS
Copy link
Member Author

Going to reopen this -- main outstanding issue is how much we can trust tree comparison. The framework throw helper calls tend to have very simple argument trees, but that may not be the case elsewhere.

Added a simple heuristic to not tail call throw helpers if there's more than one such call in a method, and instead hope merging kicks in.

Still need to add some test cases for odd flow patterns.

Latest jit-diffs:

Total bytes of diff: -51919 (-0.13% of base)
    diff is an improvement.

Top file improvements by size (bytes):
       -7639 : System.Private.CoreLib.dasm (-0.17% of base)
       -4829 : System.Memory.dasm (-2.08% of base)
       -4749 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.08% of base)
       -4593 : System.Collections.Immutable.dasm (-0.42% of base)
       -2930 : System.Private.Xml.dasm (-0.08% of base)

83 total files with size differences (83 improved, 0 regressed), 46 unchanged.

Top method regressions by size (bytes):
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          13 ( 0.07% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(ref,ref,int,byref,struct,struct,bool,bool,bool,bool,ref,ref,bool,ref,ref):this
           4 ( 3.48% of base) : Microsoft.CodeAnalysis.dasm - ObjectReader:CreateInstance(ref):ref:this

Top method improvements by size (bytes):
        -588 (-4.60% of base) : System.Collections.Immutable.dasm - Node:CopyTo(ref,int):this (28 methods)
        -438 (-4.46% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (7 methods)
        -396 (-5.97% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilderExtensions:SelectAsArray(ref,ref):struct (7 methods)
        -386 (-6.88% of base) : System.Memory.dasm - SequenceReader`1:TryCopyMultisegment(struct):bool:this (6 methods)
        -294 (-1.13% of base) : Microsoft.CodeAnalysis.dasm - AsyncQueue`1:WithCancellation(ref,struct):ref (49 methods)

Top method regressions by size (percentage):
           4 ( 3.48% of base) : Microsoft.CodeAnalysis.dasm - ObjectReader:CreateInstance(ref):ref:this
          29 ( 1.67% of base) : Microsoft.CodeAnalysis.dasm - <DescendantTriviaIntoTrivia>d__161:MoveNext():bool:this
          17 ( 0.86% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
          17 ( 0.86% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(struct,struct,ref,bool,struct,ref,struct,ref,struct):int
           3 ( 0.33% of base) : xunit.core.dasm - MemberDataAttributeBase:GetData(ref):ref:this

Top method improvements by size (percentage):
         -56 (-17.50% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CompactSymbols(int):this
        -147 (-16.67% of base) : System.Collections.Immutable.dasm - ImmutableList`1:Sort(int,int,ref):ref:this (7 methods)
        -278 (-16.27% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:GetRowCounts():struct:this
         -19 (-16.24% of base) : System.Net.Http.dasm - WinHttpHandler:SetSessionHandleConnectionOptions(ref):this
         -56 (-16.18% of base) : Microsoft.CodeAnalysis.dasm - ExceptionHandlerContainerScope:CloseScope(ref):this

2643 total methods with size differences (2637 improved, 6 regressed), 200301 unchanged.

@AndyAyersMS AndyAyersMS reopened this Oct 12, 2019
@AndyAyersMS
Copy link
Member Author

Going to bounce this to trigger retest.

@AndyAyersMS AndyAyersMS reopened this Oct 15, 2019
@AndyAyersMS
Copy link
Member Author

Ah, looks like I need to rebase...

Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.

Resolves #14770.

Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.

We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.
* Unify helper classes
* Stack allocate map headers
* Phasify
* Implement an earlyout scheme
Revise this phase to run just before `optOptimizeFlow`, so that we can leverage
the ref counts and predecessor lists to ensure we make correct flow updates.

Don't bother trying to clean out IR, let that happen naturally from as blocks
become unreferenced.
Suppress tail calling noreturn methods if there is more than one such
call site in the method, hoping that instead we can merge the calls.
Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mostly questions.

src/jit/compiler.h Show resolved Hide resolved
src/jit/flowgraph.cpp Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for the explanations

@AndyAyersMS
Copy link
Member Author

Looking at GenTreeCall::Compare, there are quite a few fields that don't factor in:

fgArgInfo
regArgListCount
regArgList
callSig
gtReturnTypeDesc
gtOtherRegs
gtSpillFlags
gtFlags
gtCallMoreFlags
gtReturnType
... the big union ...
gtEntryPoint (partial)

Not sure what to do about this yet. Some of the information is partially redundant so current comparisons may be sufficient (eg return type will be determined by the call target).

src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Show resolved Hide resolved
@AndyAyersMS
Copy link
Member Author

Diff summary for the test case methods. Note not using tail calls for throw helpers hurts code size slightly, in some cases.

PMI Diffs for C:\repos\coreclr3\tests\src\JIT\opt\ThrowHelper\ThrowHelper.exe for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -65 (-1.82% of base)
    diff is an improvement.
Top file improvements by size (bytes):
         -65 : ThrowHelper.dasm (-1.82% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions by size (bytes):
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchTwoTail(int)
Top method improvements by size (bytes):
         -18 (-38.30% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_IfOneTail(int)
         -15 (-26.79% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpersSameArgTrees(int,ref):int
         -12 (-27.91% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_If(int):int
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoOneTail(int)
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchGotoOneTail(int)
Top method regressions by size (percentage):
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_IfTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoTwoTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchOneTail(int)
           4 (20.00% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchTwoTail(int)
Top method improvements by size (percentage):
         -18 (-38.30% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_IfOneTail(int)
         -12 (-27.91% of base) : ThrowHelper.dasm - TestCases:ThreeIdenticalThrowHelpers_If(int):int
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_GotoOneTail(int)
          -9 (-27.27% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpers_SwitchGotoOneTail(int)
         -15 (-26.79% of base) : ThrowHelper.dasm - TestCases:TwoIdenticalThrowHelpersSameArgTrees(int,ref):int
17 total methods with size differences (11 improved, 6 regressed), 21 unchanged.

@AndyAyersMS
Copy link
Member Author

@BruceForstall want to take another look?

There is still some concern about the potential for bugs in tree comparison (mainly false matches from insufficiently detailed compares). If we think this is a deal breaker I'll probably shelve this and try and address that separately.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but LGTM

src/jit/compphases.h Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
if (updateCount > 0)
{
assert(fgModified);
fgModified = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little worrisome, but ok... Wouldn't (Shouldn't) the first code to generate "flow-dependent side data" reset this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to logic in fgCreateFuncletPrologBlocks.

The fgModified concept should probably be updated to just directly invalidate the dependent analyses, if any (in this case and the funclet case, there are none).

@AndyAyersMS AndyAyersMS merged commit b962c97 into dotnet:master Oct 30, 2019
@AndyAyersMS AndyAyersMS deleted the SimpleThrowTailMerge branch October 30, 2019 16:58
@AndyAyersMS
Copy link
Member Author

See also dotnet/runtime#35135.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coalescing calls to non-returning throw helpers?
7 participants