diff --git a/Documentation/design-docs/GuardedDevirtualization.md b/Documentation/design-docs/GuardedDevirtualization.md new file mode 100644 index 000000000000..ba1f7bf9204f --- /dev/null +++ b/Documentation/design-docs/GuardedDevirtualization.md @@ -0,0 +1,533 @@ +# Guarded Devirtualization + +## Overview + +Guarded devirtualization is a proposed new optimization for the JIT in .NET Core +3.0. This document describes the motivation, initial design sketch, and highlights +various issues needing further investigation. + +## Motivation + +The .NET Core JIT is able to do a limited amount of devirtualization for virtual +and interface calls. This ability was added in .NET Core 2.0. To devirtualize +the JIT must be able to demonstrate one of two things: either that it knows the +type of some reference exactly (say because it has seen a `newobj`) or that the +declared type of the reference is a `final` class (aka `sealed`). For virtual +calls the JIT can also devirtualize if it can prove the method is marked as `final`. + +However, most of the time the JIT is unable to determine exactness or finalness +and so devirtualization fails. Statistics show that currently only around 15% of +virtual call sites can be devirtualized. Result are even more pessimistic for +interface calls, where success rates are around 5%. + +There are a variety of reasons for this. The JIT analysis is somewhat weak. +Historically all the JIT cared about was whether some location held **a** reference +type, not a specific reference type. So the current type propagation has been +retrofitted and there are places where types just get lost. The JIT analysis +happens quite early (during importation) and there is only minimal ability to do +data flow analysis at this stage. So for current devirtualization the source of +the type information and the consumption must be fairly close in the code. A +more detailed accounting of some of the shortcomings can be found in +[CoreCLR#9908](https://github.com/dotnet/coreclr/issues/9908). + +Resolution of these issues will improve the ability of the JIT to devirtualize, +but even the best analysis possible will still miss out on many cases. Some call +sites are truly polymorphic. Some others are truly monomorphic but proving this +would require sophisticated interprocedural analyses that are not practical in +the JIT or in a system as dynamic as the CLR. And some sites are monomorphic in +practice but potentially polymorphic. + +As an alternative, when devirtualization fails, the JIT can perform *guarded +devirtualization*. Here the JIT creates an `if-then-else` block set in place of +a virtual or interface call and inserts a runtime type test (or similar) into +the `if` -- the "guard". If the guard test succeeds the JIT knows the type of +the reference, so the `then` block can directly invoke the method corresponding +to that type. If the test fails then the `else` block is executed and this +contains the original virtual or interface call. + +The upshot is that the JIT conditionally gains the benefit of devirtualization at +the expense of increased code size, longer JIT times, and slightly longer code +paths around the call. So long as the JIT's guess at the type is somewhat +reasonable, this optimization can improve performance. + +## Opportunity + +One might imagine that the JIT's guess about the type of the reference has to be +pretty good for devirtualization to pay off. Somewhat surprisingly, at least based +on our initial results, that is not the case. + +### Virtual Calls: The Two-Class Case + +Given these class declarations: +```C# +class B +{ + public virtual int F() { return 33; } +} + +class D : B +{ + public override int F() { return 44; } +} +``` +Suppose we have an array `B[]` that is randomly filled with instances of `B` and +`D` and each element is class `B` with probability `p`. We time how long +it takes to invoke `F` on each member of the array (note the JIT will not ever +be able to devirtualize these calls), and plot the times as a function of `p`. +The result is something like the following: + +![two classes baseline perf](TwoClassesBaseline.JPG) + +Modern hardware includes an indirect branch target predictor and we can see it +in action here. When the array element type is predictable (`p` very close to +zero or very close to 1) performance is better. When the element type is +unpredictable (`p` near 0.5) performance is quite a bit worse. + +From this we can see that a correctly predicted virtual call requires about +19 time units and worst case incorrect prediction around 55 time units. There is +some timing overhead here too so the real costs are a bit lower. + +Now imagine we update the JIT to do guarded devirtualization and check if the +element is indeed type `B`. If so the JIT can call `B.F` directly and in our +prototype the JIT will also inline the call. So we would expect that if the +element types are mostly `B`s (that is if `p` is near 1.0) we'd see very good +performance, and if the element type is mostly `D` (that is `p` near 0.0) +performance should perhaps slightly worse than the un-optimized case as there is +now extra code to run check before the call. + +![two classes devirt perf](TwoClassesDevirt.JPG) + +However as you can see the performance of devirtualized case (blue line) is as +good or better than the un-optimized case for all values of `p`. This is perhaps +unexpected and deserves some explanation. + +Recall that modern hardware also includes a branch predictor. For small or large +values of `p` this predictor will correctly guess whether the test added by the +JIT will resolve to the `then` or `else` case. For small values of `p` the JIT +guess will be wrong and control will flow to the `else` block. But unlike the +original example, the indirect call here will only see instances of type `D` and +so the indirect branch predictor will work extremely well. So the overhead for +the small `p` case is similar to the well-predicted indirect case without guarded +devirtualization. As `p` increases the branch predictor starts to mispredict and +that costs some cycles. But when it mispredicts control reaches the `then` block +which executes the inlined call. So the cost of misprediction is offset by the +faster execution and the cost stays relatively flat. + +As `p` passes 0.5 the branch predictor flips its prediction to prefer the `then` +case. As before mispredicts are costly and send us down the `else` path but there +we still execute a correctly predicted indirect call. + +And as `p` approaches 1.0 the cost falls as the branch predictor is almost always +correct and so the cost is simply that of the inlined call. + +So oddly enough the guarded devirtualization case shown here does not require any +sort of perf tradeoff. The JIT is better off guessing the more likely case but +even guessing the less likely case can pay off and doesn't hurt performance. + +One might suspect at this point that the two class case is a special case and that +the results do not hold up in more complex cases. More on that shortly. + +Before moving on, we should point out that virtual calls in the current +CLR are a bit more expensive than in C++, because the CLR uses a two-level method +table. That is, the indirect call sequence is something like: +```asm +000095 mov rax, qword ptr [rcx] ; fetch method table +000098 mov rax, qword ptr [rax+72] ; fetch proper chunk +00009C call qword ptr [rax+32]B:F():int:this ; call indirect +``` +This is a chain of 3 dependent loads and so best-case will require at least 3x +the best cache latency (plus any indirect prediction overhead). + +So the virtual call costs for the CLR are high. The chunked method table design +was adopted to save space (chunks can be shared by different classes) at the +expense of some performance. And this apparently makes guarded devirtualization +pay off over a wider range of class distributions than one might expect. + +And for completeness, the full guarded `if-then-else` sequence measured above is: +```asm +00007A mov rcx, gword ptr [rsi+8*rcx+16] ; fetch array element +00007F mov rax, 0x7FFC9CFB4A90 ; B's method table +000089 cmp qword ptr [rcx], rax ; method table test +00008C jne SHORT G_M30756_IG06 ; jump if class is not B + +00008E mov eax, 33 ; inlined B.F +000093 jmp SHORT G_M30756_IG07 + +G_M30756_IG06: + +000095 mov rax, qword ptr [rcx] ; fetch method table +000098 mov rax, qword ptr [rax+72] ; fetch proper chunk +00009C call qword ptr [rax+32]B:F():int:this ; call indirect + +G_M30756_IG07: +``` +Note there is a redundant load of the method table (hidden in the `cmp`) that +could be eliminated with a bit more work on the prototype. So guarded +devirtualization perf could potentially be even better than is shown above, +especially for smaller values of `p`. + +### Virtual Calls: The Three-Class Case + +Now to return to the question we asked above: is there something about the two +class case that made guarded devirtualization especially attractive? Read on. + +Suppose we introduce a third class into the mix and repeat the above measurement. +There are now two probabilities in play: `p`, the probability that the element +has class `B`, and `p1`, the probability that the element has class `D`, and +there is a third class `E`. To avoid introducing a 3D plot we'll first simply +average the results for the various values of `p1` and plot performance as a +function of `p`: + +![three classes devirt perf](ThreeClassesDevirt.JPG) + +The right-hand side (`p` near 1.0) looks a lot like the previous chart. This is +not surprising as there are relatively few instances of that third class. But the +middle and left hand side differ and are more costly. + +For the un-optimized case (orange) the difference is directly attributable to +the performance of the indirect ranch predictor. Even when `p` is small there +are still two viable branch targets (on average) and some some degree of indirect +misprediction. + +For the optimized case we now see that guarded devirtualization performs worse +than no optimization if the JIT's guess is completely wrong. The penalty is not +that bad because the JIT-introduced branch is predictable. But even at very +modest values of `p` guarded devirtualization starts to win out. + +Because we've averaged over `p1` you might suspect that we're hiding something. +The following chart shows the min and max values as well as the average, and also +shows the two-class result (dashed lines). + +![three classes devirt perf ranges](ThreeClassesDevirtFull.JPG) + +You can see the minimum values are very similar to the two class case; these +are cases where the `p1` is close to 0 or close to 1. And that makes sense because +if there really are only two classes despite the potential of there being three +then we'd expect to see similar results as in the case where there only can be +two classes. + +And as noted above, if `p` is high enough then the curves also converge to the +two class case, as the relative mixture of `D` and `E` is doesn't matter: the +predominance of `B` wins out. + +For low values of `p` the actual class at the call site is some mixture of `D` +and `E`. Here's some detail (the x axis now shows `p1` and `p` as upper and +lower values respectively). + +![three classes devirt perf detail](ThreeClassesDevirtDetail.JPG) + +The worst case for perf for both is when the mixture of `D` and `E` is +unpredictably 50-50 and there are no `B`s. Once we mix in just 10% of `B` then +guarded devirt performs better no matter what distribution we have for the other +two classes. Worst case overhead -- where the JIT guesses a class that never +appears, and the other classes are evenly distributed -- is around 20%. + +So it seems reasonable to say that so long as the JIT can make a credible guess +about the possible class -- say a guess that is right at least 10% of the time +-- then there is quite likely a performance benefit to guarded +devirtualization for virtual calls. + +We'll need to verify this with more scenarios, but these initial results are +certainly encouraging. + +### Virtual Calls: Testing for Multiple Cases + +One might deduce from the above that if there are two likely candidates the JIT +should test for each. This is certainly a possibility and in C++ compilers that +do indirect call profiling there are cases where multiple tests are considered +a good idea. But there's also additional code size and another branch. + +This is something we'll look into further. + +### Interface Calls: The Two Class Case + +Interface calls on the CLR are implemented via [Virtual Stub Dispatch]( +https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md +) (aka VSD). Calls are made through an indirection cell that initially points +at a lookup stub. On the first call, the interface target is identified from the +object's method table and the lookup stub is replaced with a dispatch stub that +checks for that specific method table in a manner quite similar to guarded +devirtualization. + +If the method table check fails a counter is incremented, and once the counter +reaches a threshold the dispatch stub is replaced with a resolve stub that looks +up the right target in a process-wide hash table. + +For interface call sites that are monomorphic, the VSD mechanism (via the dispatch +stub) executes the following code sequence (here for x64) +```asm +; JIT-produced code +; +; set up R11 with interface target info +mov R11, ... ; additional VSD info for call +mov RCX, ... ; dispatch target object +cmp [rcx], rcx ; null check (unnecessary) +call [addr] ; call indirect through indir cell + +; dispatch stub +cmp [RCX], targetMT ; check for right method table +jne DISPATCH-FAIL ; bail to resolve stub if check fails (uses R11 info) +jmp targetCode ; else "tail call" the right method +``` + +At first glance it might appear that adding guarded devirtualization on top of +VSD may not provide much benefit for monomorphic sites. However the guarded +devirtualization test doesn't use an indirection cell and doesn't require R11 +setup, may be able to optimize away the null check, and opens the door for +inlining. So it should be slightly cheaper on average and significantly cheaper +in some cases. + +(Note [CoreCLR#1422](https://github.com/dotnet/coreclr/issues/14222) indicates +we should be able to optimize away the null check in any case). + +If the guarded tests fails we've filtered out one method table the dispatch cell +now works well even if a call site alternates between two classes. So we'd expect +the combination of guarded devirtualization and VSD to perform well on the two +class test and only show limitations when faced with mixtures of three or more +classes. + +If the guard test always fails we have the up-front cost for the vtable fetch +(which should amortize pretty well with the subsequent fetch in the) stub plus +the predicted not taken branch. So we'd expect the cost for the two-class cases +where the JIT's prediction is always wrong to be a bit higher). + +The graph below shows the measured results. To makes sure we're not overly impacted +by residual VSD state we use a fresh call site for each value of p. The solid +orange line is the current cost. The dashed orange line is the corresponding cost +for a virtual call with the same value of p. The solid blue line is the cost with +an up-front guarded test. As noted there is some slowdown when the JIT always +guesses the wrong class, but the break-even point (not shown) is at a relatively +small probability of a correct guess. + +![two classes interface devirt](TwoClassesInterface.JPG) + +### Interface Calls: The Three Class Case + +As with virtual calls you may strongly suspect the two class case for interface +calls is special. And you'd be right. + +If we mix a third class in as we did above, we see similar changes in the +performance mix for interface calls, as seen below. But also as with virtual calls +the JIT's guess doesn't have to be all that good to see payoffs. At around 10% +correct, guessing wins on average, and around 30% correct guessing is always a +perf win. + +![three classes interface devirt](ThreeClassesInterface.JPG) + +### Delegate Speculation + +While we have been discussing this topic in the context of virtual calls, the +method is general and can be applied to indirect calls as well. Here the guard +test may just test for a particular function rather than a type. + +`Delegate.Invoke` is a special method that can eventually turns into an indirect +call. The JIT could speculate about the possible target of this call. Choosing +a good target here would require some kind of indirect call profiling. + +### Calli Speculation + +Indirect calls also arise via the `calli` opcode. As with delegates, choosing a +target here likely requires specialized profiling. + +### Costs + +Given the optimistic take on performance, it is important to remember that +there are also some costs involved to guarded devirtualization: increased code +size and increased JIT time. There may also be some second-order effects on +the local code generation as we've introduced control flow into the method where +it didn't exist previously. + +A naive implementation that aggressively performs guarded devirtualization +increases code size overall by about 5% as measured by PMI. JIT time increase +was not measured but should be in that same ballpark. Some assemblies see code +size increasing by as much as 12%. + +However the guarded devirtualization only kicks in for about 15% of the methods. +So the average relative size increase in a method with virtual calls is probably +more like 33%. + +There may be some inefficiencies in the current prototype that can be fixed to +reduce the code size impact. Aside from the extra method table fetch noted above +the duplicated calls have the same sets of arguments and so we might be able to +amortize argument evaluation costs better. And there are some complexities around +handling return values (especially for implicit by-reference structures) that +likewise might be able to be tightened up. + +Nevertheless, blindly optimizing all virtual calls with guarded devirtualization +is not likely the right approach. Something more selective is almost certainly +needed. + +However we have done code-expanding optimizations somewhat blindly before, and +we could contain the size growth risk by restricting this optimization to Tier1. +Also PMI can overstate size impact seen in real scenarios as it may over-count +the impact of changes in methods that are always inlined. So we should look at +size increases from some actual scenarios. + +And perhaps I'll look at the size impact of loop cloning as a precedent. + +## Implementation Considerations + +To get the data above and a better feel for the challenges involved we have +implemented a prototype. It is currently located on this branch: +[GuardedDevirtFoundations](https://github.com/AndyAyersMS/coreclr/tree/GuardedDevirtFoundations). + +The prototype can introduce guarded devirtualization for some virtual and +interface calls. It supports inlining of the directly invoked method. It uses +the JIT's "best known type" as the class to predict. It also anticipates being +able to query the runtime for implementing classes of an interface. + +### Phase Ordering + +For the most part, devirtualization is done very early on in the JIT, during +importation. This allows devirtualized calls to subsequently be inlined, and for +devirtualization of call sites in inlinees to take advantage of type information +propagating down into the inlinee from inlined arguments. + +We want those same properties to hold for guarded devirtualization candidates. +So conceptually the transformation should happen in the same place. However it is +not possible to introduce new control flow in the importer (ignoring for the moment +the possibility of using question ops). So the actual transformation must be +deferred until sometime after the importer runs and before the inliner runs. + +This deferral is a bit problematic as some key bits of importer state are needed +to query the runtime about the properties of a call target. So if we defer the +transformation we need to somehow capture the data needed for these queries and +make it available later. The current prototype uses (abuses?) the inline +candidate information for this. As part of this we require that all speculative +devirtualization sites be treated as inline candidates, at least initially. +This has the side effect of hoisting the call to be a top level (statement) +expression and introduces a return value placeholder. + +We currently already have a similar transformation in the JIT, the "fat calli" +transformation needed on CoreRT. This transformation runs at the right time -- +after the importer and before the inliner -- and introduces the right kind of +`if-then-else` control flow structure. So the thought is to generalize this to +handle guarded devirtualization as well. + +### Recognition + +In the prototype, candidates are recognized during the initial importer driven +call to `impDevirtualizeCall`. If the only reason devirtualization fails is lack +of exactness, then the call is marked as a guarded devirtualization candidate. + +### Devirtualization + +To produce the direct call the prototype updates the `this` passed in the `then` +version of the call so it has the exact predicted type. It then re-invokes +`impDevirtualizeCall` which should now succeed as the type is now exactly +known. The benefit of reuse here is that certain special cases of devirtualization +are now more likely to be handled. + +### Inline Candidacy + +The prototype currently sets up all virtual and interface calls as potential +inline candidates. One open question is whether it is worth doing guarded +devirtualization simply to introduce a direct call. As an alternative we could +insist that the directly called method also be something that is potentially +inlineable. One can argue that call overhead matters much more for small methods +that are also likely good inline candidates. + +The inline candidate info is based on the apparent method invoked at the virtual +site. This is the base method, the one that introduces the virtual slot. So if we +speculatively check for some class and that class overrides, we need to somehow +update the inline info. How to best do this is unclear. + +### Return Values + +Because the candidate calls are handled as inline candidates, the JIT hoists the +call to a top level expression (which is good) during importation and introduces +a return value placeholder into the place the call occupied in its original tree. +(Oddly we introduce return value placeholders for some calls that don't return a +a value -- we should fix this). The placeholder points back at the call. + +When we split the call into two calls we can't keep this structure intact as there +needs to be a 1-1 relationship between call and placeholder. So the prototype +needs to save the return value in a new local and then update the placeholder to +refer to that local. This can be tricky because in some cases we haven't yet settled +on what the actual type of the return value is. + +The handling of return values in the early stages of the JIT (arguably, in the entire +JIT) is quite messy. The ABI details bleed through quite early and do so somewhat +unevenly. This mostly impacts methods that return structures as different ABIs have +quite different conventions, and the IR is transformed to reflect those conventions +at different times for un-inlined calls, inlineable calls that end up not getting +inlined, and for calls that get inlined. In particular, structures that are small +enough to be returned by value (in a register or set of registers) need careful +handling. + +The prototype skips over such by-value-returning struct methods today. Some of +the logic found in `fgUpdateInlineReturnExpressionPlaceHolder` needs to be pulled +in to properly type the call return value so we can properly type the temp. Or +perhaps we could leverage some of importer-time transformations that are done for +the fat calli cases. + +For larger structs we should arrange so that the call(s) write their return values +directly into the new temp, instead of copying the value from wherever they +return it into a temp, to avoid one level of struct copy. Doing so may require +upstream zero init of the return value struct and this should only happen in one +place. + +## Open Issues + +Here are some of the issues that need to be looked into more carefully. + +### Policy + +- what is the best mechanism for guessing which class to test for? + - instrument Tier0 code? + - look at types of arguments? + - ask runtime for set of known classes? + - harvest info from runtime caches (VSD)? + - add instrumenting Tier1 to collect data and Tier2 to optimize? +- is there some efficient way to test for class ranges? Currently the JIT is +doing an exact type test. But we really care more about what method is going to +be invoked. So if there is a range of types `D1...DN` that all will invoke some +particular method can we test for them all somehow? +- or should we test the method after the method lookup (possibly worse tradeoff +because of the chunked method table arrangement, also tricky as a method can +have multiple addresses over time. Since many types can share a chunk this +might allow devirtualization over a wider set of classes (good) but we'd lose +knowledge of exact types (bad). Not clear how these tradeoffs play out. +- interaction of guarded devirt with VSD? For interface calls we are sort of +inlining the first level of the VSD into the JITted code. +- revocation or reworking of the guard if the JIT's prediction turns out to bad? +- improve regular devirtualization to reduce need for guarded +devirtualization. +- should we enable this for preJITted code? In preJITted code the target method +table is not a JIT-time constant and must be looked up. +- in the prototype, guarded devirtualization and late devirtualization sometimes +conflict. Say we fail to devirtualize a site, and so expand via guarded devirtualization +guessing some class X. The residual virtual call then may be optimizable via late +devirtualization, and this may discover the actual class. In that case the guarded +devirtualization is not needed. But currently it can't be undone. +- we probably don't want to bother with guarded devirtualization if we can't also +inline. But it takes us several evaluation steps to determine if a call can +be inlined, some of these happening *after* we've done the guarded expansion. +Again this expansion can't be undone. +- so perhaps we need to build an undo capability for the cases where guarded +devirtualization doesn't lead to inlining and/or where late devirtualization also +applies. + +### Implementation + +- avoid re-fetching method table for latent virtual call (should reduce code +size and improve overall perf win) +- look at how effectively we are sharing argument setup (might reduce code size +and JIT time impact) -- perhaps implement head merging? +- handle return values in full generality +- il offsets +- flag residual calls as not needing null checks +- properly establish inline candidacy +- decide if the refactoring of `InlineCandidateInfo` is the right way to pass +information from importer to the indirect transform phase + +### Futures + +- can we cover multiple calls with one test? This can happen already if the +subsequent call is introduced via inlining of the directly called method, as we +know the exact type along that path. But for back to back calls to virtual +methods off of the same object it would be nice to do just one test. +- should we test for multiple types? Once we've peeled off the "most likely" case +if the conditional probability of the next most likely case is high it is probably +worth testing for it too. I believe the C++ compiler will test up to 3 candidates +this way... but that's a lot of code expansion. \ No newline at end of file diff --git a/Documentation/design-docs/ThreeClassesDevirt.JPG b/Documentation/design-docs/ThreeClassesDevirt.JPG new file mode 100644 index 000000000000..1ae19baab4a1 Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirt.JPG differ diff --git a/Documentation/design-docs/ThreeClassesDevirtDetail.JPG b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG new file mode 100644 index 000000000000..0edba7479e94 Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG differ diff --git a/Documentation/design-docs/ThreeClassesDevirtFull.JPG b/Documentation/design-docs/ThreeClassesDevirtFull.JPG new file mode 100644 index 000000000000..2c09920a7cc4 Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirtFull.JPG differ diff --git a/Documentation/design-docs/ThreeClassesInterface.JPG b/Documentation/design-docs/ThreeClassesInterface.JPG new file mode 100644 index 000000000000..cbd3551f7452 Binary files /dev/null and b/Documentation/design-docs/ThreeClassesInterface.JPG differ diff --git a/Documentation/design-docs/TwoClassesBaseline.JPG b/Documentation/design-docs/TwoClassesBaseline.JPG new file mode 100644 index 000000000000..3a8b4b21e860 Binary files /dev/null and b/Documentation/design-docs/TwoClassesBaseline.JPG differ diff --git a/Documentation/design-docs/TwoClassesDevirt.JPG b/Documentation/design-docs/TwoClassesDevirt.JPG new file mode 100644 index 000000000000..7c48264eef57 Binary files /dev/null and b/Documentation/design-docs/TwoClassesDevirt.JPG differ diff --git a/Documentation/design-docs/TwoClassesInterface.JPG b/Documentation/design-docs/TwoClassesInterface.JPG new file mode 100644 index 000000000000..69063dc28af3 Binary files /dev/null and b/Documentation/design-docs/TwoClassesInterface.JPG differ diff --git a/src/jit/compiler.cpp b/src/jit/compiler.cpp index 569f491b1608..65187d178b06 100644 --- a/src/jit/compiler.cpp +++ b/src/jit/compiler.cpp @@ -4543,10 +4543,8 @@ void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, JitFlags fgRemovePreds(); } - if (IsTargetAbi(CORINFO_CORERT_ABI) && doesMethodHaveFatPointer()) - { - fgTransformFatCalli(); - } + // Transform indirect calls that require control flow expansion. + fgTransformIndirectCalls(); EndPhase(PHASE_IMPORTATION); diff --git a/src/jit/compiler.h b/src/jit/compiler.h index 986f0ab6ca0e..92ed0fd84869 100644 --- a/src/jit/compiler.h +++ b/src/jit/compiler.h @@ -3298,6 +3298,13 @@ class Compiler return IsTargetAbi(CORINFO_CORERT_ABI) ? TYP_I_IMPL : TYP_REF; } + void impDevirtualizeCall(GenTreeCall* call, + CORINFO_METHOD_HANDLE* method, + unsigned* methodFlags, + CORINFO_CONTEXT_HANDLE* contextHandle, + CORINFO_CONTEXT_HANDLE* exactContextHandle, + bool isLateDevirtualization); + //========================================================================= // PROTECTED //========================================================================= @@ -3355,12 +3362,6 @@ class Compiler CORINFO_CALL_INFO* callInfo, IL_OFFSET rawILOffset); - void impDevirtualizeCall(GenTreeCall* call, - CORINFO_METHOD_HANDLE* method, - unsigned* methodFlags, - CORINFO_CONTEXT_HANDLE* contextHandle, - CORINFO_CONTEXT_HANDLE* exactContextHandle); - CORINFO_CLASS_HANDLE impGetSpecialIntrinsicExactReturnType(CORINFO_METHOD_HANDLE specialIntrinsicHandle); bool impMethodInfo_hasRetBuffArg(CORINFO_METHOD_INFO* methInfo); @@ -3866,7 +3867,7 @@ class Compiler bool forceInline, InlineResult* inlineResult); - void impCheckCanInline(GenTree* call, + void impCheckCanInline(GenTreeCall* call, CORINFO_METHOD_HANDLE fncHandle, unsigned methAttr, CORINFO_CONTEXT_HANDLE exactContextHnd, @@ -3895,6 +3896,11 @@ class Compiler bool exactContextNeedsRuntimeLookup, CORINFO_CALL_INFO* callInfo); + void impMarkInlineCandidateHelper(GenTreeCall* call, + CORINFO_CONTEXT_HANDLE exactContextHnd, + bool exactContextNeedsRuntimeLookup, + CORINFO_CALL_INFO* callInfo); + bool impTailCallRetTypeCompatible(var_types callerRetType, CORINFO_CLASS_HANDLE callerRetTypeClass, var_types calleeRetType, @@ -4134,7 +4140,7 @@ class Compiler void fgImport(); - void fgTransformFatCalli(); + void fgTransformIndirectCalls(); void fgInline(); @@ -5355,8 +5361,8 @@ class Compiler #ifdef DEBUG static fgWalkPreFn fgDebugCheckInlineCandidates; - void CheckNoFatPointerCandidatesLeft(); - static fgWalkPreFn fgDebugCheckFatPointerCandidates; + void CheckNoTransformableIndirectCallsRemain(); + static fgWalkPreFn fgDebugCheckForTransformableIndirectCalls; #endif void fgPromoteStructs(); @@ -6139,6 +6145,7 @@ class Compiler #define OMF_HAS_NULLCHECK 0x00000010 // Method contains null check. #define OMF_HAS_FATPOINTER 0x00000020 // Method contains call, that needs fat pointer transformation. #define OMF_HAS_OBJSTACKALLOC 0x00000040 // Method contains an object allocated on the stack. +#define OMF_HAS_GUARDEDDEVIRT 0x00000080 // Method contains guarded devirtualization candidate bool doesMethodHaveFatPointer() { @@ -6157,6 +6164,27 @@ class Compiler void addFatPointerCandidate(GenTreeCall* call); + bool doesMethodHaveGuardedDevirtualization() + { + return (optMethodFlags & OMF_HAS_GUARDEDDEVIRT) != 0; + } + + void setMethodHasGuardedDevirtualization() + { + optMethodFlags |= OMF_HAS_GUARDEDDEVIRT; + } + + void clearMethodHasGuardedDevirtualization() + { + optMethodFlags &= ~OMF_HAS_GUARDEDDEVIRT; + } + + void addGuardedDevirtualizationCandidate(GenTreeCall* call, + CORINFO_METHOD_HANDLE methodHandle, + CORINFO_CLASS_HANDLE classHandle, + unsigned methodAttr, + unsigned classAttr); + unsigned optMethodFlags; // Recursion bound controls how far we can go backwards tracking for a SSA value. diff --git a/src/jit/flowgraph.cpp b/src/jit/flowgraph.cpp index dd184539fde2..ad893171593f 100644 --- a/src/jit/flowgraph.cpp +++ b/src/jit/flowgraph.cpp @@ -5892,6 +5892,21 @@ void Compiler::fgFindBasicBlocks() { // This temp should already have the type of the return value. JITDUMP("\nInliner: re-using pre-existing spill temp V%02u\n", lvaInlineeReturnSpillTemp); + + if (info.compRetType == TYP_REF) + { + // We may have co-opted an existing temp for the return spill. + // We likely assumed it was single-def at the time, but now + // we can see it has multiple definitions. + if ((retBlocks > 1) && (lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef == 1)) + { + // Make sure it is no longer marked single def. This is only safe + // to do if we haven't ever updated the type. + assert(!lvaTable[lvaInlineeReturnSpillTemp].lvClassInfoUpdated); + JITDUMP("Marked return spill temp V%02u as NOT single def temp\n", lvaInlineeReturnSpillTemp); + lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef = 0; + } + } } else { @@ -21859,43 +21874,58 @@ void Compiler::fgInline() do { - /* Make the current basic block address available globally */ - + // Make the current basic block address available globally compCurBB = block; - GenTreeStmt* stmt; - GenTree* expr; - - for (stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt) + for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt) { - expr = stmt->gtStmtExpr; - // See if we can expand the inline candidate - if ((expr->gtOper == GT_CALL) && ((expr->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0)) +#ifdef DEBUG + // In debug builds we want the inline tree to show all failed + // inlines. Some inlines may fail very early and never make it to + // candidate stage. So scan the tree looking for those early failures. + fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt); +#endif + + GenTree* expr = stmt->gtStmtExpr; + + // The importer ensures that all inline candidates are + // statement expressions. So see if we have a call. + if (expr->IsCall()) { GenTreeCall* call = expr->AsCall(); - InlineResult inlineResult(this, call, stmt, "fgInline"); - fgMorphStmt = stmt; + // We do. Is it an inline candidate? + // + // Note we also process GuardeDevirtualizationCandidates here as we've + // split off GT_RET_EXPRs for them even when they are not inline candidates + // as we need similar processing to ensure they get patched back to where + // they belong. + if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate()) + { + InlineResult inlineResult(this, call, stmt, "fgInline"); + + fgMorphStmt = stmt; - fgMorphCallInline(call, &inlineResult); + fgMorphCallInline(call, &inlineResult); - if (stmt->gtStmtExpr->IsNothingNode()) - { - fgRemoveStmt(block, stmt); - continue; + // fgMorphCallInline may have updated the + // statement expression to a GT_NOP if the + // call returned a value, regardless of + // whether the inline succeeded or failed. + // + // If so, remove the GT_NOP and continue + // on with the next statement. + if (stmt->gtStmtExpr->IsNothingNode()) + { + fgRemoveStmt(block, stmt); + continue; + } } } - else - { -#ifdef DEBUG - // Look for non-candidates. - fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt); -#endif - } - // See if we need to replace the return value place holder. - // Also, see if this update enables further devirtualization. + // See if we need to replace some return value place holders. + // Also, see if this replacement enables further devirtualization. // // Note we have both preorder and postorder callbacks here. // @@ -22008,6 +22038,11 @@ Compiler::fgWalkResult Compiler::fgFindNonInlineCandidate(GenTree** pTree, fgWal void Compiler::fgNoteNonInlineCandidate(GenTreeStmt* stmt, GenTreeCall* call) { + if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate()) + { + return; + } + InlineResult inlineResult(this, call, nullptr, "fgNotInlineCandidate"); InlineObservation currentObservation = InlineObservation::CALLSITE_NOT_CANDIDATE; @@ -22471,10 +22506,11 @@ Compiler::fgWalkResult Compiler::fgLateDevirtualization(GenTree** pTree, fgWalkD } #endif // DEBUG - CORINFO_METHOD_HANDLE method = call->gtCallMethHnd; - unsigned methodFlags = 0; - CORINFO_CONTEXT_HANDLE context = nullptr; - comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr); + CORINFO_METHOD_HANDLE method = call->gtCallMethHnd; + unsigned methodFlags = 0; + CORINFO_CONTEXT_HANDLE context = nullptr; + const bool isLateDevirtualization = true; + comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr, isLateDevirtualization); } } else if (tree->OperGet() == GT_ASG) @@ -25507,8 +25543,10 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock* block, return true; } -// FatCalliTransformer transforms calli that can use fat function pointer. -// Fat function pointer is pointer with the second least significant bit set, +// IndirectCallTransformer transforms indirect calls that involve fat pointers +// or guarded devirtualization. +// +// A fat function pointer is pointer with the second least significant bit set, // if the bit is set, the pointer (after clearing the bit) actually points to // a tuple where // instantiationArgument is a hidden first argument required by method pointer. @@ -25557,38 +25595,60 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock* block, // subsequent statements // } // -class FatCalliTransformer +class IndirectCallTransformer { public: - FatCalliTransformer(Compiler* compiler) : compiler(compiler) + IndirectCallTransformer(Compiler* compiler) : compiler(compiler) { } //------------------------------------------------------------------------ // Run: run transformation for each block. // - void Run() + // Returns: + // Count of calls transformed. + int Run() { + int count = 0; + for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext) { - TransformBlock(block); + count += TransformBlock(block); } + + return count; } private: //------------------------------------------------------------------------ - // TransformBlock: look through statements and transform statements with fat pointer calls. + // TransformBlock: look through statements and transform statements with + // particular indirect calls // - void TransformBlock(BasicBlock* block) + // Returns: + // Count of calls transformed. + // + int TransformBlock(BasicBlock* block) { + int count = 0; + for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt) { if (ContainsFatCalli(stmt)) { - StatementTransformer stmtTransformer(compiler, block, stmt); - stmtTransformer.Run(); + FatPointerCallTransformer transformer(compiler, block, stmt); + transformer.Run(); + count++; + } + + if (ContainsGuardedDevirtualizationCandidate(stmt)) + { + GuardedDevirtualizationTransformer transformer(compiler, block, stmt); + transformer.Run(); + count++; } } + + return count; } //------------------------------------------------------------------------ @@ -25609,39 +25669,158 @@ class FatCalliTransformer return fatPointerCandidate->IsCall() && fatPointerCandidate->AsCall()->IsFatPointerCandidate(); } - class StatementTransformer + //------------------------------------------------------------------------ + // ContainsGuardedDevirtualizationCandidate: check does this statement contain a virtual + // call that we'd like to guardedly devirtualize? + // + // Return Value: + // true if contains, false otherwise. + // + // Notes: + // calls are hoisted to top level ... (we hope) + bool ContainsGuardedDevirtualizationCandidate(GenTreeStmt* stmt) + { + GenTree* candidate = stmt->gtStmtExpr; + return candidate->IsCall() && candidate->AsCall()->IsGuardedDevirtualizationCandidate(); + } + + class Transformer { public: - StatementTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt) + Transformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt) : compiler(compiler), currBlock(block), stmt(stmt) { - remainderBlock = nullptr; - checkBlock = nullptr; - thenBlock = nullptr; - elseBlock = nullptr; - doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG); - origCall = GetCall(stmt); - fptrAddress = origCall->gtCallAddr; - pointerType = fptrAddress->TypeGet(); + remainderBlock = nullptr; + checkBlock = nullptr; + thenBlock = nullptr; + elseBlock = nullptr; + origCall = nullptr; } //------------------------------------------------------------------------ // Run: transform the statement as described above. // - void Run() + virtual void Run() + { + origCall = GetCall(stmt); + Transform(); + } + + void Transform() { - ClearFatFlag(); + JITDUMP("*** %s: transforming [%06u]\n", Name(), compiler->dspTreeID(stmt)); + FixupRetExpr(); + ClearFlag(); CreateRemainder(); CreateCheck(); CreateThen(); CreateElse(); - RemoveOldStatement(); SetWeights(); ChainFlow(); } - private: + protected: + virtual const char* Name() = 0; + virtual void ClearFlag() = 0; + virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) = 0; + virtual void FixupRetExpr() = 0; + + //------------------------------------------------------------------------ + // CreateRemainder: split current block at the call stmt and + // insert statements after the call into remainderBlock. + // + void CreateRemainder() + { + remainderBlock = compiler->fgSplitBlockAfterStatement(currBlock, stmt); + unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT; + remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags; + } + + virtual void CreateCheck() = 0; + + //------------------------------------------------------------------------ + // CreateAndInsertBasicBlock: ask compiler to create new basic block. + // and insert in into the basic block list. + // + // Arguments: + // jumpKind - jump kind for the new basic block + // insertAfter - basic block, after which compiler has to insert the new one. + // + // Return Value: + // new basic block. + BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter) + { + BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true); + if ((insertAfter->bbFlags & BBF_INTERNAL) == 0) + { + block->bbFlags &= ~BBF_INTERNAL; + block->bbFlags |= BBF_IMPORTED; + } + return block; + } + + virtual void CreateThen() = 0; + virtual void CreateElse() = 0; + + //------------------------------------------------------------------------ + // RemoveOldStatement: remove original stmt from current block. + // + void RemoveOldStatement() + { + compiler->fgRemoveStmt(currBlock, stmt); + } + + //------------------------------------------------------------------------ + // SetWeights: set weights for new blocks. + // + void SetWeights() + { + remainderBlock->inheritWeight(currBlock); + checkBlock->inheritWeight(currBlock); + thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY); + elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY); + } + + //------------------------------------------------------------------------ + // ChainFlow: link new blocks into correct cfg. + // + void ChainFlow() + { + assert(!compiler->fgComputePredsDone); + checkBlock->bbJumpDest = elseBlock; + thenBlock->bbJumpDest = remainderBlock; + } + + Compiler* compiler; + BasicBlock* currBlock; + BasicBlock* remainderBlock; + BasicBlock* checkBlock; + BasicBlock* thenBlock; + BasicBlock* elseBlock; + GenTreeStmt* stmt; + GenTreeCall* origCall; + + const int HIGH_PROBABILITY = 80; + }; + + class FatPointerCallTransformer final : public Transformer + { + public: + FatPointerCallTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt) + : Transformer(compiler, block, stmt) + { + doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG); + fptrAddress = origCall->gtCallAddr; + pointerType = fptrAddress->TypeGet(); + } + + protected: + virtual const char* Name() + { + return "FatPointerCall"; + } + //------------------------------------------------------------------------ // GetCall: find a call in a statement. // @@ -25650,7 +25829,7 @@ class FatCalliTransformer // // Return Value: // call tree node pointer. - GenTreeCall* GetCall(GenTreeStmt* callStmt) + virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) { GenTree* tree = callStmt->gtStmtExpr; GenTreeCall* call = nullptr; @@ -25667,28 +25846,22 @@ class FatCalliTransformer } //------------------------------------------------------------------------ - // ClearFatFlag: clear fat pointer candidate flag from the original call. + // ClearFlag: clear fat pointer candidate flag from the original call. // - void ClearFatFlag() + virtual void ClearFlag() { origCall->ClearFatPointerCandidate(); } - //------------------------------------------------------------------------ - // CreateRemainder: split current block at the fat call stmt and - // insert statements after the call into remainderBlock. - // - void CreateRemainder() + // FixupRetExpr: no action needed as we handle this in the importer. + virtual void FixupRetExpr() { - remainderBlock = compiler->fgSplitBlockAfterStatement(currBlock, stmt); - unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT; - remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags; } //------------------------------------------------------------------------ // CreateCheck: create check block, that checks fat pointer bit set. // - void CreateCheck() + virtual void CreateCheck() { checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock); GenTree* fatPointerMask = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK); @@ -25702,19 +25875,20 @@ class FatCalliTransformer } //------------------------------------------------------------------------ - // CreateCheck: create then block, that is executed if call address is not fat pointer. + // CreateThen: create then block, that is executed if the check succeeds. + // This simply executes the original call. // - void CreateThen() + virtual void CreateThen() { - thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock); - GenTree* nonFatCallStmt = compiler->gtCloneExpr(stmt)->AsStmt(); - compiler->fgInsertStmtAtEnd(thenBlock, nonFatCallStmt); + thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock); + GenTree* copyOfOriginalStmt = compiler->gtCloneExpr(stmt)->AsStmt(); + compiler->fgInsertStmtAtEnd(thenBlock, copyOfOriginalStmt); } //------------------------------------------------------------------------ - // CreateCheck: create else block, that is executed if call address is fat pointer. + // CreateElse: create else block, that is executed if call address is fat pointer. // - void CreateElse() + virtual void CreateElse() { elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock); @@ -25726,27 +25900,6 @@ class FatCalliTransformer compiler->fgInsertStmtAtEnd(elseBlock, fatStmt); } - //------------------------------------------------------------------------ - // CreateAndInsertBasicBlock: ask compiler to create new basic block. - // and insert in into the basic block list. - // - // Arguments: - // jumpKind - jump kind for the new basic block - // insertAfter - basic block, after which compiler has to insert the new one. - // - // Return Value: - // new basic block. - BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter) - { - BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true); - if ((insertAfter->bbFlags & BBF_INTERNAL) == 0) - { - block->bbFlags &= ~BBF_INTERNAL; - block->bbFlags |= BBF_IMPORTED; - } - return block; - } - //------------------------------------------------------------------------ // GetFixedFptrAddress: clear fat pointer bit from fat pointer address. // @@ -25843,49 +25996,306 @@ class FatCalliTransformer iterator->Rest() = compiler->gtNewArgList(hiddenArgument); } + private: + const int FAT_POINTER_MASK = 0x2; + + GenTree* fptrAddress; + var_types pointerType; + bool doesReturnValue; + }; + + class GuardedDevirtualizationTransformer final : public Transformer + { + public: + GuardedDevirtualizationTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt) + : Transformer(compiler, block, stmt), returnTemp(BAD_VAR_NUM) + { + } + //------------------------------------------------------------------------ - // RemoveOldStatement: remove original stmt from current block. + // Run: transform the statement as described above. // - void RemoveOldStatement() + virtual void Run() { - compiler->fgRemoveStmt(currBlock, stmt); + origCall = GetCall(stmt); + + JITDUMP("*** %s contemplating [%06u]\n", Name(), compiler->dspTreeID(origCall)); + + // We currently need inline candidate info to guarded devirt. + if (!origCall->IsInlineCandidate()) + { + JITDUMP("*** %s Bailing on [%06u] -- not an inline candidate\n", Name(), compiler->dspTreeID(origCall)); + ClearFlag(); + return; + } + + // For now, bail on transforming calls that still appear + // to return structs by value as there is deferred work + // needed to fix up the return type. + // + // See for instance fgUpdateInlineReturnExpressionPlaceHolder. + if (origCall->TypeGet() == TYP_STRUCT) + { + JITDUMP("*** %s Bailing on [%06u] -- can't handle by-value struct returns yet\n", Name(), + compiler->dspTreeID(origCall)); + ClearFlag(); + + // For stub calls restore the stub address + if (origCall->IsVirtualStub()) + { + origCall->gtStubCallStubAddr = origCall->gtInlineCandidateInfo->stubAddr; + } + return; + } + + Transform(); + } + + protected: + virtual const char* Name() + { + return "GuardedDevirtualization"; } //------------------------------------------------------------------------ - // SetWeights: set weights for new blocks. + // GetCall: find a call in a statement. // - void SetWeights() + // Arguments: + // callStmt - the statement with the call inside. + // + // Return Value: + // call tree node pointer. + virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) { - remainderBlock->inheritWeight(currBlock); - checkBlock->inheritWeight(currBlock); - thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY); - elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY); + GenTree* tree = callStmt->gtStmtExpr; + assert(tree->IsCall()); + GenTreeCall* call = tree->AsCall(); + return call; } //------------------------------------------------------------------------ - // ChainFlow: link new blocks into correct cfg. + // ClearFlag: clear guarded devirtualization candidate flag from the original call. // - void ChainFlow() + virtual void ClearFlag() { - assert(!compiler->fgComputePredsDone); - checkBlock->bbJumpDest = elseBlock; - thenBlock->bbJumpDest = remainderBlock; + origCall->ClearGuardedDevirtualizationCandidate(); } - Compiler* compiler; - BasicBlock* currBlock; - BasicBlock* remainderBlock; - BasicBlock* checkBlock; - BasicBlock* thenBlock; - BasicBlock* elseBlock; - GenTreeStmt* stmt; - GenTreeCall* origCall; - GenTree* fptrAddress; - var_types pointerType; - bool doesReturnValue; + //------------------------------------------------------------------------ + // CreateCheck: create check block and check method table + // + virtual void CreateCheck() + { + checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock); - const int FAT_POINTER_MASK = 0x2; - const int HIGH_PROBABILITY = 80; + // Fetch method table from object arg to call. + GenTree* thisTree = compiler->gtCloneExpr(origCall->gtCallObjp); + + // Create temp for this if the tree is costly. + if (!thisTree->IsLocal()) + { + const unsigned thisTempNum = compiler->lvaGrabTemp(true DEBUGARG("guarded devirt this temp")); + // lvaSetClass(thisTempNum, ...); + GenTree* asgTree = compiler->gtNewTempAssign(thisTempNum, thisTree); + GenTree* asgStmt = compiler->fgNewStmtFromTree(asgTree, stmt->gtStmt.gtStmtILoffsx); + compiler->fgInsertStmtAtEnd(checkBlock, asgStmt); + + thisTree = compiler->gtNewLclvNode(thisTempNum, TYP_REF); + + // Propagate the new this to the call. Must be a new expr as the call + // will live on in the else block and thisTree is used below. + origCall->gtCallObjp = compiler->gtNewLclvNode(thisTempNum, TYP_REF); + } + + GenTree* methodTable = compiler->gtNewIndir(TYP_I_IMPL, thisTree); + methodTable->gtFlags |= GTF_IND_INVARIANT; + + // Find target method table + GuardedDevirtualizationCandidateInfo* guardedInfo = origCall->gtGuardedDevirtualizationCandidateInfo; + CORINFO_CLASS_HANDLE clsHnd = guardedInfo->guardedClassHandle; + GenTree* targetMethodTable = compiler->gtNewIconEmbClsHndNode(clsHnd); + + // Compare and jump to else (which does the indirect call) if NOT equal + GenTree* methodTableCompare = compiler->gtNewOperNode(GT_NE, TYP_INT, targetMethodTable, methodTable); + GenTree* jmpTree = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, methodTableCompare); + GenTree* jmpStmt = compiler->fgNewStmtFromTree(jmpTree, stmt->gtStmt.gtStmtILoffsx); + compiler->fgInsertStmtAtEnd(checkBlock, jmpStmt); + } + + //------------------------------------------------------------------------ + // FixupRetExpr: set up to repair return value placeholder from call + // + virtual void FixupRetExpr() + { + // If call returns a value, we need to copy it to a temp, and + // update the associated GT_RET_EXPR to refer to the temp instead + // of the call. + // + // Note implicit by-ref returns should have already been converted + // so any struct copy we induce here should be cheap. + // + // Todo: make sure we understand how this interacts with return type + // munging for small structs. + InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo; + GenTree* retExpr = inlineInfo->retExpr; + + // Sanity check the ret expr if non-null: it should refer to the original call. + if (retExpr != nullptr) + { + assert(retExpr->gtRetExpr.gtInlineCandidate == origCall); + } + + if (origCall->TypeGet() != TYP_VOID) + { + returnTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt return temp")); + JITDUMP("Reworking call(s) to return value via a new temp V%02u\n", returnTemp); + + if (varTypeIsStruct(origCall)) + { + compiler->lvaSetStruct(returnTemp, origCall->gtRetClsHnd, false); + } + + GenTree* tempTree = compiler->gtNewLclvNode(returnTemp, origCall->TypeGet()); + + JITDUMP("Updating GT_RET_EXPR [%06u] to refer to temp V%02u\n", compiler->dspTreeID(retExpr), + returnTemp); + retExpr->gtRetExpr.gtInlineCandidate = tempTree; + } + else if (retExpr != nullptr) + { + // We still oddly produce GT_RET_EXPRs for some void + // returning calls. Just patch the ret expr to a NOP. + // + // Todo: consider bagging creation of these RET_EXPRs. The only possible + // benefit they provide is stitching back larger trees for failed inlines + // of void-returning methods. But then the calls likely sit in commas and + // the benefit of a larger tree is unclear. + JITDUMP("Updating GT_RET_EXPR [%06u] for VOID return to refer to a NOP\n", + compiler->dspTreeID(retExpr)); + GenTree* nopTree = compiler->gtNewNothingNode(); + retExpr->gtRetExpr.gtInlineCandidate = nopTree; + } + else + { + // We do not produce GT_RET_EXPRs for CTOR calls, so there is nothing to patch. + } + } + + //------------------------------------------------------------------------ + // CreateThen: create else block with direct call to method + // + virtual void CreateThen() + { + thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock); + + InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo; + CORINFO_CLASS_HANDLE clsHnd = inlineInfo->clsHandle; + + // copy 'this' to temp with exact type. + const unsigned thisTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt this exact temp")); + GenTree* clonedObj = compiler->gtCloneExpr(origCall->gtCallObjp); + GenTree* assign = compiler->gtNewTempAssign(thisTemp, clonedObj); + compiler->lvaSetClass(thisTemp, clsHnd, true); + GenTreeStmt* assignStmt = compiler->gtNewStmt(assign); + compiler->fgInsertStmtAtEnd(thenBlock, assignStmt); + + // Clone call + GenTreeCall* call = compiler->gtCloneExpr(origCall)->AsCall(); + call->gtCallObjp = compiler->gtNewLclvNode(thisTemp, TYP_REF); + call->SetIsGuarded(); + + JITDUMP("Direct call [%06u] in block BB%02u\n", compiler->dspTreeID(call), thenBlock->bbNum); + + // Then invoke impDevirtualizeCall do actually + // transform the call for us. It should succeed.... as we have + // now provided an exact typed this. + CORINFO_METHOD_HANDLE methodHnd = inlineInfo->methInfo.ftn; + unsigned methodFlags = inlineInfo->methAttr; + CORINFO_CONTEXT_HANDLE context = inlineInfo->exactContextHnd; + const bool isLateDevirtualization = true; + compiler->impDevirtualizeCall(call, &methodHnd, &methodFlags, &context, nullptr, isLateDevirtualization); + + // Presumably devirt might fail? If so we should try and avoid + // making this a guarded devirt candidate instead of ending + // up here. + assert(!call->IsVirtual()); + + // Re-establish this call as an inline candidate. + GenTree* oldRetExpr = inlineInfo->retExpr; + inlineInfo->clsHandle = clsHnd; + inlineInfo->exactContextHnd = context; + call->gtInlineCandidateInfo = inlineInfo; + + // Add the call + GenTreeStmt* callStmt = compiler->gtNewStmt(call); + compiler->fgInsertStmtAtEnd(thenBlock, callStmt); + + // If there was a ret expr for this call, we need to create a new one + // and append it just after the call. + // + // Note the original GT_RET_EXPR is sitting at the join point of the + // guarded expansion and for non-void calls, and now refers to a temp local; + // we set all this up in FixupRetExpr(). + if (oldRetExpr != nullptr) + { + GenTree* retExpr = compiler->gtNewInlineCandidateReturnExpr(call, call->TypeGet()); + inlineInfo->retExpr = retExpr; + + if (returnTemp != BAD_VAR_NUM) + { + retExpr = compiler->gtNewTempAssign(returnTemp, retExpr); + } + else + { + // We should always have a return temp if we return results by value + assert(origCall->TypeGet() == TYP_VOID); + } + + GenTreeStmt* resultStmt = compiler->gtNewStmt(retExpr); + compiler->fgInsertStmtAtEnd(thenBlock, resultStmt); + } + } + + //------------------------------------------------------------------------ + // CreateElse: create else block. This executes the unaltered indirect call. + // + virtual void CreateElse() + { + elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock); + GenTreeCall* call = origCall; + GenTreeStmt* newStmt = compiler->gtNewStmt(call); + + call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE; + call->SetIsGuarded(); + + JITDUMP("Residual call [%06u] moved to block BB%02u\n", compiler->dspTreeID(call), elseBlock->bbNum); + + if (returnTemp != BAD_VAR_NUM) + { + GenTree* assign = compiler->gtNewTempAssign(returnTemp, call); + newStmt->gtStmtExpr = assign; + } + + // For stub calls, restore the stub address. For everything else, + // null out the candidate info field. + if (call->IsVirtualStub()) + { + JITDUMP("Restoring stub addr %p from candidate info\n", call->gtInlineCandidateInfo->stubAddr); + call->gtStubCallStubAddr = call->gtInlineCandidateInfo->stubAddr; + } + else + { + call->gtInlineCandidateInfo = nullptr; + } + + compiler->fgInsertStmtAtEnd(elseBlock, newStmt); + + // Set the original statement to a nop. + stmt->gtStmtExpr = compiler->gtNewNothingNode(); + } + + private: + unsigned returnTemp; }; Compiler* compiler; @@ -25894,46 +26304,74 @@ class FatCalliTransformer #ifdef DEBUG //------------------------------------------------------------------------ -// fgDebugCheckFatPointerCandidates: callback to make sure there are no more GTF_CALL_M_FAT_POINTER_CHECK calls. +// fgDebugCheckForTransformableIndirectCalls: callback to make sure there +// are no more GTF_CALL_M_FAT_POINTER_CHECK or GTF_CALL_M_GUARDED_DEVIRT +// calls remaining // -Compiler::fgWalkResult Compiler::fgDebugCheckFatPointerCandidates(GenTree** pTree, fgWalkData* data) +Compiler::fgWalkResult Compiler::fgDebugCheckForTransformableIndirectCalls(GenTree** pTree, fgWalkData* data) { GenTree* tree = *pTree; if (tree->IsCall()) { assert(!tree->AsCall()->IsFatPointerCandidate()); + assert(!tree->AsCall()->IsGuardedDevirtualizationCandidate()); } return WALK_CONTINUE; } //------------------------------------------------------------------------ -// CheckNoFatPointerCandidatesLeft: walk through blocks and check that there are no fat pointer candidates left. +// CheckNoTransformableIndirectCallsRemain: walk through blocks and check +// that there are no indirect call candidates left to transform. // -void Compiler::CheckNoFatPointerCandidatesLeft() +void Compiler::CheckNoTransformableIndirectCallsRemain() { assert(!doesMethodHaveFatPointer()); + assert(!doesMethodHaveGuardedDevirtualization()); + for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext) { for (GenTreeStmt* stmt = fgFirstBB->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt) { - fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckFatPointerCandidates); + fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckForTransformableIndirectCalls); } } } #endif //------------------------------------------------------------------------ -// fgTransformFatCalli: find and transform fat calls. +// fgTransformIndirectCalls: find and transform various indirect calls // -void Compiler::fgTransformFatCalli() +// These transformations happen post-import because they may introduce +// control flow. +// +void Compiler::fgTransformIndirectCalls() { - assert(IsTargetAbi(CORINFO_CORERT_ABI)); - FatCalliTransformer fatCalliTransformer(this); - fatCalliTransformer.Run(); - clearMethodHasFatPointer(); -#ifdef DEBUG - CheckNoFatPointerCandidatesLeft(); -#endif + JITDUMP("\n*************** in fgTransformIndirectCalls(%s)\n", compIsForInlining() ? "inlinee" : "root"); + + if (doesMethodHaveFatPointer() || doesMethodHaveGuardedDevirtualization()) + { + IndirectCallTransformer indirectCallTransformer(this); + int count = indirectCallTransformer.Run(); + + if (count > 0) + { + JITDUMP("\n*************** After fgTransformIndirectCalls() [%d calls transformed]\n", count); + INDEBUG(if (verbose) { fgDispBasicBlocks(true); }); + } + else + { + JITDUMP(" -- no transforms done (?)\n"); + } + + clearMethodHasFatPointer(); + clearMethodHasGuardedDevirtualization(); + } + else + { + JITDUMP(" -- no candidates to transform\n"); + } + + INDEBUG(CheckNoTransformableIndirectCallsRemain();); } //------------------------------------------------------------------------ diff --git a/src/jit/gentree.cpp b/src/jit/gentree.cpp index 56eea9d50935..b86bec75f6b3 100644 --- a/src/jit/gentree.cpp +++ b/src/jit/gentree.cpp @@ -7343,8 +7343,9 @@ GenTree* Compiler::gtCloneExpr( copy->gtCall.setEntryPoint(tree->gtCall.gtEntryPoint); #endif -#ifdef DEBUG +#if defined(DEBUG) || defined(INLINE_DATA) copy->gtCall.gtInlineObservation = tree->gtCall.gtInlineObservation; + copy->gtCall.gtRawILOffset = tree->gtCall.gtRawILOffset; #endif copy->AsCall()->CopyOtherRegFlags(tree->AsCall()); @@ -9334,9 +9335,22 @@ void Compiler::gtDispNode(GenTree* tree, IndentStack* indentStack, __in __in_z _ goto DASH; case GT_CALL: - if (tree->gtFlags & GTF_CALL_INLINE_CANDIDATE) + if (tree->gtCall.IsInlineCandidate()) { - printf("I"); + if (tree->gtCall.IsGuardedDevirtualizationCandidate()) + { + printf("&"); + } + else + { + printf("I"); + } + --msgLength; + break; + } + else if (tree->gtCall.IsGuardedDevirtualizationCandidate()) + { + printf("G"); --msgLength; break; } diff --git a/src/jit/gentree.h b/src/jit/gentree.h index c6c5841e982d..360fa1b43ca5 100644 --- a/src/jit/gentree.h +++ b/src/jit/gentree.h @@ -144,8 +144,8 @@ enum gtCallTypes : BYTE /*****************************************************************************/ struct BasicBlock; - struct InlineCandidateInfo; +struct GuardedDevirtualizationCandidateInfo; typedef unsigned short AssertionIndex; @@ -3547,6 +3547,8 @@ struct GenTreeCall final : public GenTree // the comma result is unused. #define GTF_CALL_M_DEVIRTUALIZED 0x00040000 // GT_CALL -- this call was devirtualized #define GTF_CALL_M_UNBOXED 0x00080000 // GT_CALL -- this call was optimized to use the unboxed entry point +#define GTF_CALL_M_GUARDED_DEVIRT 0x00100000 // GT_CALL -- this call is a candidate for guarded devirtualization +#define GTF_CALL_M_GUARDED 0x00200000 // GT_CALL -- this call was transformed by guarded devirtualization // clang-format on @@ -3739,6 +3741,11 @@ struct GenTreeCall final : public GenTree return (gtCallMoreFlags & GTF_CALL_M_FAT_POINTER_CHECK) != 0; } + bool IsGuardedDevirtualizationCandidate() const + { + return (gtCallMoreFlags & GTF_CALL_M_GUARDED_DEVIRT) != 0; + } + bool IsPure(Compiler* compiler) const; bool HasSideEffects(Compiler* compiler, bool ignoreExceptions = false, bool ignoreCctors = false) const; @@ -3758,11 +3765,31 @@ struct GenTreeCall final : public GenTree return (gtCallMoreFlags & GTF_CALL_M_DEVIRTUALIZED) != 0; } + bool IsGuarded() const + { + return (gtCallMoreFlags & GTF_CALL_M_GUARDED) != 0; + } + bool IsUnboxed() const { return (gtCallMoreFlags & GTF_CALL_M_UNBOXED) != 0; } + void ClearGuardedDevirtualizationCandidate() + { + gtCallMoreFlags &= ~GTF_CALL_M_GUARDED_DEVIRT; + } + + void SetGuardedDevirtualizationCandidate() + { + gtCallMoreFlags |= GTF_CALL_M_GUARDED_DEVIRT; + } + + void SetIsGuarded() + { + gtCallMoreFlags |= GTF_CALL_M_GUARDED; + } + unsigned gtCallMoreFlags; // in addition to gtFlags unsigned char gtCallType : 3; // value from the gtCallTypes enumeration @@ -3774,8 +3801,9 @@ struct GenTreeCall final : public GenTree // only used for CALLI unmanaged calls (CT_INDIRECT) GenTree* gtCallCookie; // gtInlineCandidateInfo is only used when inlining methods - InlineCandidateInfo* gtInlineCandidateInfo; - void* gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined + InlineCandidateInfo* gtInlineCandidateInfo; + GuardedDevirtualizationCandidateInfo* gtGuardedDevirtualizationCandidateInfo; + void* gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined CORINFO_GENERIC_HANDLE compileTimeHelperArgumentHandle; // Used to track type handle argument of dynamic helpers void* gtDirectCallAddress; // Used to pass direct call address between lower and codegen }; diff --git a/src/jit/importer.cpp b/src/jit/importer.cpp index e8ccdc28c921..2b38ba2d7e32 100644 --- a/src/jit/importer.cpp +++ b/src/jit/importer.cpp @@ -7738,8 +7738,7 @@ var_types Compiler::impImportCall(OPCODE opcode, } else { - // ok, the stub is available at compile type. - + // The stub address is known at compile time call = gtNewCallNode(CT_USER_FUNC, callInfo->hMethod, callRetTyp, nullptr, ilOffset); call->gtCall.gtStubCallStubAddr = callInfo->stubLookup.constLookup.addr; call->gtFlags |= GTF_CALL_VIRT_STUB; @@ -8386,8 +8385,9 @@ var_types Compiler::impImportCall(OPCODE opcode, assert(obj->gtType == TYP_REF); // See if we can devirtualize. + const bool isLateDevirtualization = false; impDevirtualizeCall(call->AsCall(), &callInfo->hMethod, &callInfo->methodFlags, &callInfo->contextHandle, - &exactContextHnd); + &exactContextHnd, isLateDevirtualization); } if (impIsThis(obj)) @@ -8715,26 +8715,43 @@ var_types Compiler::impImportCall(OPCODE opcode, { // Sometimes "call" is not a GT_CALL (if we imported an intrinsic that didn't turn into a call) - bool fatPointerCandidate = call->AsCall()->IsFatPointerCandidate(); + GenTreeCall* origCall = call->AsCall(); + + const bool isFatPointerCandidate = origCall->IsFatPointerCandidate(); + const bool isInlineCandidate = origCall->IsInlineCandidate(); + const bool isGuardedDevirtualizationCandidate = origCall->IsGuardedDevirtualizationCandidate(); + if (varTypeIsStruct(callRetTyp)) { + // Need to treat all "split tree" cases here, not just inline candidates call = impFixupCallStructReturn(call->AsCall(), sig->retTypeClass); } - if ((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0) + // TODO: consider handling fatcalli cases this way too...? + if (isInlineCandidate || isGuardedDevirtualizationCandidate) { + // We should not have made any adjustments in impFixupCallStructReturn + // as we defer those until we know the fate of the call. + assert(call == origCall); + assert(opts.OptEnabled(CLFLG_INLINING)); - assert(!fatPointerCandidate); // We should not try to inline calli. + assert(!isFatPointerCandidate); // We should not try to inline calli. // Make the call its own tree (spill the stack if needed). impAppendTree(call, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs); // TODO: Still using the widened type. - call = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp)); + GenTree* retExpr = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp)); + + // Link the retExpr to the call so if necessary we can manipulate it later. + origCall->gtInlineCandidateInfo->retExpr = retExpr; + + // Propagate retExpr as the placeholder for the call. + call = retExpr; } else { - if (fatPointerCandidate) + if (isFatPointerCandidate) { // fatPointer candidates should be in statements of the form call() or var = call(). // Such form allows to find statements with fat calls without walking through whole trees @@ -18376,7 +18393,7 @@ void Compiler::impCanInlineIL(CORINFO_METHOD_HANDLE fncHandle, /***************************************************************************** */ -void Compiler::impCheckCanInline(GenTree* call, +void Compiler::impCheckCanInline(GenTreeCall* call, CORINFO_METHOD_HANDLE fncHandle, unsigned methAttr, CORINFO_CONTEXT_HANDLE exactContextHnd, @@ -18389,7 +18406,7 @@ void Compiler::impCheckCanInline(GenTree* call, struct Param { Compiler* pThis; - GenTree* call; + GenTreeCall* call; CORINFO_METHOD_HANDLE fncHandle; unsigned methAttr; CORINFO_CONTEXT_HANDLE exactContextHnd; @@ -18521,22 +18538,42 @@ void Compiler::impCheckCanInline(GenTree* call, (varTypeIsStruct(fncRetType) && (fncRealRetType == TYP_STRUCT))); #endif + // Allocate an InlineCandidateInfo structure, // - // Allocate an InlineCandidateInfo structure + // Or, reuse the existing GuardedDevirtualizationCandidateInfo, + // which was pre-allocated to have extra room. // InlineCandidateInfo* pInfo; - pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo; - - pInfo->dwRestrictions = dwRestrictions; - pInfo->methInfo = methInfo; - pInfo->methAttr = pParam->methAttr; - pInfo->clsHandle = clsHandle; - pInfo->clsAttr = clsAttr; - pInfo->fncRetType = fncRetType; - pInfo->exactContextHnd = pParam->exactContextHnd; - pInfo->ilCallerHandle = pParam->pThis->info.compMethodHnd; - pInfo->initClassResult = initClassResult; - pInfo->preexistingSpillTemp = BAD_VAR_NUM; + + if (pParam->call->IsGuardedDevirtualizationCandidate()) + { + pInfo = pParam->call->gtInlineCandidateInfo; + } + else + { + pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo; + + // Null out bits we don't use when we're just inlining + pInfo->guardedClassHandle = nullptr; + pInfo->guardedMethodHandle = nullptr; + pInfo->stubAddr = nullptr; + } + + pInfo->methInfo = methInfo; + pInfo->ilCallerHandle = pParam->pThis->info.compMethodHnd; + pInfo->clsHandle = clsHandle; + pInfo->exactContextHnd = pParam->exactContextHnd; + pInfo->retExpr = nullptr; + pInfo->dwRestrictions = dwRestrictions; + pInfo->preexistingSpillTemp = BAD_VAR_NUM; + pInfo->clsAttr = clsAttr; + pInfo->methAttr = pParam->methAttr; + pInfo->initClassResult = initClassResult; + pInfo->fncRetType = fncRetType; + pInfo->exactContextNeedsRuntimeLookup = false; + + // Note exactContextNeedsRuntimeLookup is reset later on, + // over in impMarkInlineCandidate. *(pParam->ppInlineCandidateInfo) = pInfo; @@ -19486,6 +19523,60 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* ad // callInfo -- call info from VM // // Notes: +// Mostly a wrapper for impMarkInlineCandidateHelper that also undoes +// guarded devirtualization for virtual calls where the method we'd +// devirtualize to cannot be inlined. + +void Compiler::impMarkInlineCandidate(GenTree* callNode, + CORINFO_CONTEXT_HANDLE exactContextHnd, + bool exactContextNeedsRuntimeLookup, + CORINFO_CALL_INFO* callInfo) +{ + GenTreeCall* call = callNode->AsCall(); + + // Do the actual evaluation + impMarkInlineCandidateHelper(call, exactContextHnd, exactContextNeedsRuntimeLookup, callInfo); + + // If this call is an inline candidate or is not a guarded devirtualization + // candidate, we're done. + if (call->IsInlineCandidate() || !call->IsGuardedDevirtualizationCandidate()) + { + return; + } + + // If we can't inline the call we'd guardedly devirtualize to, + // we undo the guarded devirtualization, as the benefit from + // just guarded devirtualization alone is likely not worth the + // extra jit time and code size. + // + // TODO: it is possibly interesting to allow this, but requires + // fixes elsewhere too... + JITDUMP("Revoking guarded devirtualization candidacy for call [%06u]: target method can't be inlined\n", + dspTreeID(call)); + + call->ClearGuardedDevirtualizationCandidate(); + + // If we have a stub address, restore it back into the union that it shares + // with the candidate info. + if (call->IsVirtualStub()) + { + JITDUMP("Restoring stub addr %p from guarded devirt candidate info\n", + call->gtGuardedDevirtualizationCandidateInfo->stubAddr); + call->gtStubCallStubAddr = call->gtGuardedDevirtualizationCandidateInfo->stubAddr; + } +} + +//------------------------------------------------------------------------ +// impMarkInlineCandidateHelper: determine if this call can be subsequently +// inlined +// +// Arguments: +// callNode -- call under scrutiny +// exactContextHnd -- context handle for inlining +// exactContextNeedsRuntimeLookup -- true if context required runtime lookup +// callInfo -- call info from VM +// +// Notes: // If callNode is an inline candidate, this method sets the flag // GTF_CALL_INLINE_CANDIDATE, and ensures that helper methods have // filled in the associated InlineCandidateInfo. @@ -19495,10 +19586,10 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* ad // method may be marked as "noinline" to short-circuit any // future assessments of calls to this method. -void Compiler::impMarkInlineCandidate(GenTree* callNode, - CORINFO_CONTEXT_HANDLE exactContextHnd, - bool exactContextNeedsRuntimeLookup, - CORINFO_CALL_INFO* callInfo) +void Compiler::impMarkInlineCandidateHelper(GenTreeCall* call, + CORINFO_CONTEXT_HANDLE exactContextHnd, + bool exactContextNeedsRuntimeLookup, + CORINFO_CALL_INFO* callInfo) { // Let the strategy know there's another call impInlineRoot()->m_inlineStrategy->NoteCall(); @@ -19523,7 +19614,6 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode, return; } - GenTreeCall* call = callNode->AsCall(); InlineResult inlineResult(this, call, nullptr, "impMarkInlineCandidate"); // Don't inline if not optimizing root method @@ -19560,14 +19650,20 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode, if (call->IsVirtual()) { - inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT); - return; + // Allow guarded devirt calls to be treated as inline candidates, + // but reject all other virtual calls. + if (!call->IsGuardedDevirtualizationCandidate()) + { + inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT); + return; + } } /* Ignore helper calls */ if (call->gtCallType == CT_HELPER) { + assert(!call->IsGuardedDevirtualizationCandidate()); inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_CALL_TO_HELPER); return; } @@ -19583,17 +19679,27 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode, * restricts the inliner to non-expanding inlines. I removed the check to allow for non-expanding * inlining in throw blocks. I should consider the same thing for catch and filter regions. */ - CORINFO_METHOD_HANDLE fncHandle = call->gtCallMethHnd; + CORINFO_METHOD_HANDLE fncHandle; unsigned methAttr; - // Reuse method flags from the original callInfo if possible - if (fncHandle == callInfo->hMethod) + if (call->IsGuardedDevirtualizationCandidate()) { - methAttr = callInfo->methodFlags; + fncHandle = call->gtGuardedDevirtualizationCandidateInfo->guardedMethodHandle; + methAttr = info.compCompHnd->getMethodAttribs(fncHandle); } else { - methAttr = info.compCompHnd->getMethodAttribs(fncHandle); + fncHandle = call->gtCallMethHnd; + + // Reuse method flags from the original callInfo if possible + if (fncHandle == callInfo->hMethod) + { + methAttr = callInfo->methodFlags; + } + else + { + methAttr = info.compCompHnd->getMethodAttribs(fncHandle); + } } #ifdef DEBUG @@ -19693,14 +19799,13 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode, return; } - // The old value should be NULL - assert(call->gtInlineCandidateInfo == nullptr); + // The old value should be null OR this call should be a guarded devirtualization candidate. + assert((call->gtInlineCandidateInfo == nullptr) || call->IsGuardedDevirtualizationCandidate()); - // The new value should not be NULL. + // The new value should not be null. assert(inlineCandidateInfo != nullptr); inlineCandidateInfo->exactContextNeedsRuntimeLookup = exactContextNeedsRuntimeLookup; - - call->gtInlineCandidateInfo = inlineCandidateInfo; + call->gtInlineCandidateInfo = inlineCandidateInfo; // Mark the call node as inline candidate. call->gtFlags |= GTF_CALL_INLINE_CANDIDATE; @@ -19835,6 +19940,7 @@ bool Compiler::IsMathIntrinsic(GenTree* tree) // methodFlags -- [IN/OUT] flags for the method to call. Updated iff call devirtualized. // contextHandle -- [IN/OUT] context handle for the call. Updated iff call devirtualized. // exactContextHnd -- [OUT] updated context handle iff call devirtualized +// isLateDevirtualization -- if devirtualization is happening after importation // // Notes: // Virtual calls in IL will always "invoke" the base class method. @@ -19859,11 +19965,16 @@ bool Compiler::IsMathIntrinsic(GenTree* tree) // to instead make a local copy. If that is doable, the call is // updated to invoke the unboxed entry on the local copy. // +// When guarded devirtualization is enabled, this method will mark +// calls as guarded devirtualization candidates, if the type of `this` +// is not exactly known, and there is a plausible guess for the type. + void Compiler::impDevirtualizeCall(GenTreeCall* call, CORINFO_METHOD_HANDLE* method, unsigned* methodFlags, CORINFO_CONTEXT_HANDLE* contextHandle, - CORINFO_CONTEXT_HANDLE* exactContextHandle) + CORINFO_CONTEXT_HANDLE* exactContextHandle, + bool isLateDevirtualization) { assert(call != nullptr); assert(method != nullptr); @@ -20029,16 +20140,67 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call, } #endif // defined(DEBUG) - // Bail if obj class is an interface. + // See if the jit's best type for `obj` is an interface. // See for instance System.ValueTuple`8::GetHashCode, where lcl 0 is System.IValueTupleInternal // IL_021d: ldloc.0 // IL_021e: callvirt instance int32 System.Object::GetHashCode() + // + // If so, we can't devirtualize, but we may be able to do guarded devirtualization. if ((objClassAttribs & CORINFO_FLG_INTERFACE) != 0) { - JITDUMP("--- obj class is interface, sorry\n"); + // If we're called during early devirtualiztion, attempt guarded devirtualization + // if there's currently just one implementing class. + if (exactContextHandle == nullptr) + { + JITDUMP("--- obj class is interface...unable to dervirtualize, sorry\n"); + return; + } + + CORINFO_CLASS_HANDLE uniqueImplementingClass = NO_CLASS_HANDLE; + + // info.compCompHnd->getUniqueImplementingClass(objClass); + + if (uniqueImplementingClass == NO_CLASS_HANDLE) + { + JITDUMP("No unique implementor of interface %p (%s), sorry\n", objClass, objClassName); + return; + } + + JITDUMP("Only known implementor of interface %p (%s) is %p (%s)!\n", objClass, objClassName, + uniqueImplementingClass, eeGetClassName(uniqueImplementingClass)); + + bool guessUniqueInterface = true; + + INDEBUG(guessUniqueInterface = (JitConfig.JitGuardedDevirtualizationGuessUniqueInterface() > 0);); + + if (!guessUniqueInterface) + { + JITDUMP("Guarded devirt for unique interface implementor is not enabled, sorry\n"); + return; + } + + // Ask the runtime to determine the method that would be called based on the guessed-for type. + CORINFO_CONTEXT_HANDLE ownerType = *contextHandle; + CORINFO_METHOD_HANDLE uniqueImplementingMethod = + info.compCompHnd->resolveVirtualMethod(baseMethod, uniqueImplementingClass, ownerType); + + if (uniqueImplementingMethod == nullptr) + { + JITDUMP("Can't figure out which method would be invoked, sorry\n"); + return; + } + + JITDUMP("Interface call would invoke method %s\n", eeGetMethodName(uniqueImplementingMethod, nullptr)); + DWORD uniqueMethodAttribs = info.compCompHnd->getMethodAttribs(uniqueImplementingMethod); + DWORD uniqueClassAttribs = info.compCompHnd->getClassAttribs(uniqueImplementingClass); + + addGuardedDevirtualizationCandidate(call, uniqueImplementingMethod, uniqueImplementingClass, + uniqueMethodAttribs, uniqueClassAttribs); return; } + // If we get this far, the jit has a lower bound class type for the `this` object being used for dispatch. + // It may or may not know enough to devirtualize... if (isInterface) { assert(call->IsVirtualStub()); @@ -20052,6 +20214,9 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call, // If we failed to get a handle, we can't devirtualize. This can // happen when prejitting, if the devirtualization crosses // servicing bubble boundaries. + // + // Note if we have some way of guessing a better and more likely type we can do something similar to the code + // above for the case where the best jit type is an interface type. if (derivedMethod == nullptr) { JITDUMP("--- no derived method, sorry\n"); @@ -20066,7 +20231,7 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call, const char* derivedClassName = "?derivedClass"; const char* derivedMethodName = "?derivedMethod"; - const char* note = "speculative"; + const char* note = "inexact or not final"; if (isExact) { note = "exact"; @@ -20091,29 +20256,41 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call, } #endif // defined(DEBUG) - if (!isExact && !objClassIsFinal && !derivedMethodIsFinal) - { - // Type is not exact, and neither class or method is final. - // - // We could speculatively devirtualize, but there's no - // reason to believe the derived method is the one that - // is likely to be invoked. - // - // If there's currently no further overriding (that is, at - // the time of jitting, objClass has no subclasses that - // override this method), then perhaps we'd be willing to - // make a bet...? - JITDUMP(" Class not final or exact, method not final, no devirtualization\n"); - return; - } + const bool canDevirtualize = isExact || objClassIsFinal || (!isInterface && derivedMethodIsFinal); - // For interface calls we must have an exact type or final class. - if (isInterface && !isExact && !objClassIsFinal) + if (!canDevirtualize) { - JITDUMP(" Class not final or exact for interface, no devirtualization\n"); + JITDUMP(" Class not final or exact%s\n", isInterface ? "" : ", and method not final"); + + // Have we enabled guarded devirtualization by guessing the jit's best class? + bool guessJitBestClass = true; + INDEBUG(guessJitBestClass = (JitConfig.JitGuardedDevirtualizationGuessBestClass() > 0);); + + if (!guessJitBestClass) + { + JITDUMP("No guarded devirt: guessing for jit best class disabled\n"); + return; + } + + // Don't try guarded devirtualiztion when we're doing late devirtualization. + if (isLateDevirtualization) + { + JITDUMP("No guarded devirt during late devirtualization\n"); + return; + } + + // We will use the class that introduced the method as our guess + // for the runtime class of othe object. + CORINFO_CLASS_HANDLE derivedClass = info.compCompHnd->getMethodClass(derivedMethod); + + // Try guarded devirtualization. + addGuardedDevirtualizationCandidate(call, derivedMethod, derivedClass, derivedMethodAttribs, objClassAttribs); return; } + // All checks done. Time to transform the call. + assert(canDevirtualize); + JITDUMP(" %s; can devirtualize\n", note); // Make the updates. @@ -20427,10 +20604,25 @@ class SpillRetExprHelper { GenTree* retExpr = *pRetExpr; assert(retExpr->OperGet() == GT_RET_EXPR); - JITDUMP("Store return expression %u as a local var.\n", retExpr->gtTreeID); - unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr")); + const unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr")); + JITDUMP("Storing return expression [%06u] to a local var V%02u.\n", comp->dspTreeID(retExpr), tmp); comp->impAssignTempGen(tmp, retExpr, (unsigned)Compiler::CHECK_SPILL_NONE); *pRetExpr = comp->gtNewLclvNode(tmp, retExpr->TypeGet()); + + if (retExpr->TypeGet() == TYP_REF) + { + assert(comp->lvaTable[tmp].lvSingleDef == 0); + comp->lvaTable[tmp].lvSingleDef = 1; + JITDUMP("Marked V%02u as a single def temp\n", tmp); + + bool isExact = false; + bool isNonNull = false; + CORINFO_CLASS_HANDLE retClsHnd = comp->gtGetClassHandle(retExpr, &isExact, &isNonNull); + if (retClsHnd != nullptr) + { + comp->lvaSetClass(tmp, retClsHnd, isExact); + } + } } private: @@ -20446,8 +20638,108 @@ class SpillRetExprHelper // void Compiler::addFatPointerCandidate(GenTreeCall* call) { + JITDUMP("Marking call [%06u] as fat pointer candidate\n", dspTreeID(call)); setMethodHasFatPointer(); call->SetFatPointerCandidate(); SpillRetExprHelper helper(this); helper.StoreRetExprResultsInArgs(call); } + +//------------------------------------------------------------------------ +// addGuardedDevirtualizationCandidate: potentially mark the call as a guarded +// devirtualization candidate +// +// Notes: +// +// We currently do not mark calls as candidates when prejitting. This was done +// to simplify bringing up the associated transformation. It is worth revisiting +// if we think we can come up with a good guess for the class when prejitting. +// +// Call sites in rare or unoptimized code, and calls that require cookies are +// also not marked as candidates. +// +// As part of marking the candidate, the code spills GT_RET_EXPRs anywhere in any +// child tree, because and we need to clone all these trees when we clone the call +// as part of guarded devirtualization, and these IR nodes can't be cloned. +// +// Arguments: +// call - potentual guarded devirtialization candidate +// methodHandle - method that will be invoked if the class test succeeds +// classHandle - class that will be tested for at runtime +// methodAttr - attributes of the method +// classAttr - attributes of the class +// +void Compiler::addGuardedDevirtualizationCandidate(GenTreeCall* call, + CORINFO_METHOD_HANDLE methodHandle, + CORINFO_CLASS_HANDLE classHandle, + unsigned methodAttr, + unsigned classAttr) +{ + // This transformation only makes sense for virtual calls + assert(call->IsVirtual()); + + // Only mark calls if the feature is enabled. + const bool isEnabled = JitConfig.JitEnableGuardedDevirtualization() > 0; + + if (!isEnabled) + { + JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- disabled by jit config\n", + dspTreeID(call)); + return; + } + + // Bail when prejitting. We only do this for jitted code. + // We shoud revisit this if we think we can come up with good class guesses when prejitting. + if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT)) + { + JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- prejitting", dspTreeID(call)); + return; + } + + // Bail if not optimizing or the call site is very likely cold + if (compCurBB->isRunRarely() || opts.compDbgCode || opts.MinOpts()) + { + JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- rare / dbg / minopts\n", + dspTreeID(call)); + return; + } + + // CT_INDRECT calls may use the cookie, bail if so... + // + // If transforming these provides a benefit, we could save this off in the same way + // we save the stub address below. + if ((call->gtCallType == CT_INDIRECT) && (call->gtCall.gtCallCookie != nullptr)) + { + return; + } + + // We're all set, proceed with candidate creation. + JITDUMP("Marking call [%06u] as guarded devirtualization candidate; will guess for class %s\n", dspTreeID(call), + eeGetClassName(classHandle)); + setMethodHasGuardedDevirtualization(); + call->SetGuardedDevirtualizationCandidate(); + + // Spill off any GT_RET_EXPR subtrees so we can clone the call. + SpillRetExprHelper helper(this); + helper.StoreRetExprResultsInArgs(call); + + // Gather some information for later. Note we actually allocate InlineCandidateInfo + // here, as the devirtualized half of this call will likely become an inline candidate. + GuardedDevirtualizationCandidateInfo* pInfo = new (this, CMK_Inlining) InlineCandidateInfo; + + pInfo->guardedMethodHandle = methodHandle; + pInfo->guardedClassHandle = classHandle; + + // Save off the stub address since it shares a union with the candidate info. + if (call->IsVirtualStub()) + { + JITDUMP("Saving stub addr %p in candidate info\n", call->gtStubCallStubAddr); + pInfo->stubAddr = call->gtStubCallStubAddr; + } + else + { + pInfo->stubAddr = nullptr; + } + + call->gtGuardedDevirtualizationCandidateInfo = pInfo; +} diff --git a/src/jit/inline.cpp b/src/jit/inline.cpp index a1064762ce7a..f523d2abbea1 100644 --- a/src/jit/inline.cpp +++ b/src/jit/inline.cpp @@ -337,6 +337,7 @@ InlineContext::InlineContext(InlineStrategy* strategy) , m_CodeSizeEstimate(0) , m_Success(true) , m_Devirtualized(false) + , m_Guarded(false) , m_Unboxed(false) #if defined(DEBUG) || defined(INLINE_DATA) , m_Policy(nullptr) @@ -397,18 +398,19 @@ void InlineContext::Dump(unsigned indent) const char* inlineReason = InlGetObservationString(m_Observation); const char* inlineResult = m_Success ? "" : "FAILED: "; const char* devirtualized = m_Devirtualized ? " devirt" : ""; + const char* guarded = m_Guarded ? " guarded" : ""; const char* unboxed = m_Unboxed ? " unboxed" : ""; if (m_Offset == BAD_IL_OFFSET) { - printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken, - inlineResult, inlineReason, devirtualized, unboxed, calleeName); + printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken, + inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName); } else { IL_OFFSET offset = jitGetILoffs(m_Offset); - printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID, calleeToken, - inlineResult, inlineReason, devirtualized, unboxed, calleeName); + printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID, + calleeToken, inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName); } } @@ -1205,6 +1207,7 @@ InlineContext* InlineStrategy::NewSuccess(InlineInfo* inlineInfo) calleeContext->m_Observation = inlineInfo->inlineResult->GetObservation(); calleeContext->m_Success = true; calleeContext->m_Devirtualized = originalCall->IsDevirtualized(); + calleeContext->m_Guarded = originalCall->IsGuarded(); calleeContext->m_Unboxed = originalCall->IsUnboxed(); #if defined(DEBUG) || defined(INLINE_DATA) @@ -1266,6 +1269,7 @@ InlineContext* InlineStrategy::NewFailure(GenTreeStmt* stmt, InlineResult* inlin failedContext->m_Callee = inlineResult->GetCallee(); failedContext->m_Success = false; failedContext->m_Devirtualized = originalCall->IsDevirtualized(); + failedContext->m_Guarded = originalCall->IsGuarded(); failedContext->m_Unboxed = originalCall->IsUnboxed(); assert(InlIsValidObservation(failedContext->m_Observation)); diff --git a/src/jit/inline.h b/src/jit/inline.h index 551341e2e667..e94765a74c67 100644 --- a/src/jit/inline.h +++ b/src/jit/inline.h @@ -494,22 +494,36 @@ class InlineResult bool m_Reported; }; +// GuardedDevirtualizationCandidateInfo provides information about +// a potential target of a virtual call. + +struct GuardedDevirtualizationCandidateInfo +{ + CORINFO_CLASS_HANDLE guardedClassHandle; + CORINFO_METHOD_HANDLE guardedMethodHandle; + void* stubAddr; +}; + // InlineCandidateInfo provides basic information about a particular // inline candidate. +// +// It is a superset of GuardedDevirtualizationCandidateInfo: calls +// can start out as GDv candidates and turn into inline candidates -struct InlineCandidateInfo +struct InlineCandidateInfo : public GuardedDevirtualizationCandidateInfo { - DWORD dwRestrictions; CORINFO_METHOD_INFO methInfo; - unsigned methAttr; + CORINFO_METHOD_HANDLE ilCallerHandle; // the logical IL caller of this inlinee. CORINFO_CLASS_HANDLE clsHandle; + CORINFO_CONTEXT_HANDLE exactContextHnd; + GenTree* retExpr; + DWORD dwRestrictions; + unsigned preexistingSpillTemp; unsigned clsAttr; + unsigned methAttr; + CorInfoInitClassResult initClassResult; var_types fncRetType; - CORINFO_METHOD_HANDLE ilCallerHandle; // the logical IL caller of this inlinee. - CORINFO_CONTEXT_HANDLE exactContextHnd; bool exactContextNeedsRuntimeLookup; - CorInfoInitClassResult initClassResult; - unsigned preexistingSpillTemp; }; // InlArgInfo describes inline candidate argument properties. @@ -683,6 +697,11 @@ class InlineContext return m_Devirtualized; } + bool IsGuarded() const + { + return m_Guarded; + } + bool IsUnboxed() const { return m_Unboxed; @@ -703,6 +722,7 @@ class InlineContext int m_CodeSizeEstimate; // in bytes * 10 bool m_Success : 1; // true if this was a successful inline bool m_Devirtualized : 1; // true if this was a devirtualized call + bool m_Guarded : 1; // true if this was a guarded call bool m_Unboxed : 1; // true if this call now invokes the unboxed entry #if defined(DEBUG) || defined(INLINE_DATA) diff --git a/src/jit/jitconfigvalues.h b/src/jit/jitconfigvalues.h index 383264e6a349..649db4ebe659 100644 --- a/src/jit/jitconfigvalues.h +++ b/src/jit/jitconfigvalues.h @@ -363,6 +363,15 @@ CONFIG_INTEGER(JitEnableRemoveEmptyTry, W("JitEnableRemoveEmptyTry"), 0) #endif // defined(FEATURE_CORECLR) #endif // DEBUG +// Overall master enable for Guarded Devirtualization. Currently not enabled by default. +CONFIG_INTEGER(JitEnableGuardedDevirtualization, W("JitEnableGuardedDevirtualization"), 0) + +#if defined(DEBUG) +// Various policies for GuardedDevirtualization +CONFIG_INTEGER(JitGuardedDevirtualizationGuessUniqueInterface, W("JitGuardedDevirtualizationGuessUniqueInterface"), 1) +CONFIG_INTEGER(JitGuardedDevirtualizationGuessBestClass, W("JitGuardedDevirtualizationGuessBestClass"), 1) +#endif // DEBUG + #undef CONFIG_INTEGER #undef CONFIG_STRING #undef CONFIG_METHODSET diff --git a/src/jit/lclvars.cpp b/src/jit/lclvars.cpp index 8ce2b5956b20..339f7a5ff71d 100644 --- a/src/jit/lclvars.cpp +++ b/src/jit/lclvars.cpp @@ -2746,6 +2746,11 @@ void Compiler::lvaUpdateClass(unsigned varNum, CORINFO_CLASS_HANDLE clsHnd, bool { varDsc->lvClassHnd = clsHnd; varDsc->lvClassIsExact = isExact; + +#if DEBUG + // Note we've modified the type... + varDsc->lvClassInfoUpdated = true; +#endif // DEBUG } return; diff --git a/src/jit/morph.cpp b/src/jit/morph.cpp index 20e378ed166b..566714f2765d 100644 --- a/src/jit/morph.cpp +++ b/src/jit/morph.cpp @@ -6614,42 +6614,59 @@ GenTree* Compiler::fgMorphField(GenTree* tree, MorphAddrContext* mac) void Compiler::fgMorphCallInline(GenTreeCall* call, InlineResult* inlineResult) { - // The call must be a candiate for inlining. - assert((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0); + bool inliningFailed = false; - // Attempt the inline - fgMorphCallInlineHelper(call, inlineResult); + // Is this call an inline candidate? + if (call->IsInlineCandidate()) + { + // Attempt the inline + fgMorphCallInlineHelper(call, inlineResult); - // We should have made up our minds one way or another.... - assert(inlineResult->IsDecided()); + // We should have made up our minds one way or another.... + assert(inlineResult->IsDecided()); - // If we failed to inline, we have a bit of work to do to cleanup - if (inlineResult->IsFailure()) - { + // If we failed to inline, we have a bit of work to do to cleanup + if (inlineResult->IsFailure()) + { #ifdef DEBUG - // Before we do any cleanup, create a failing InlineContext to - // capture details of the inlining attempt. - m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult); + // Before we do any cleanup, create a failing InlineContext to + // capture details of the inlining attempt. + m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult); #endif - // It was an inline candidate, but we haven't expanded it. - if (call->gtCall.gtReturnType != TYP_VOID) + inliningFailed = true; + + // Clear the Inline Candidate flag so we can ensure later we tried + // inlining all candidates. + // + call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE; + } + } + else + { + // This wasn't an inline candidate. So it must be a GDV candidate. + assert(call->IsGuardedDevirtualizationCandidate()); + + // We already know we can't inline this call, so don't even bother to try. + inliningFailed = true; + } + + // If we failed to inline (or didn't even try), do some cleanup. + if (inliningFailed) + { + if (call->gtReturnType != TYP_VOID) { + JITDUMP("Inlining [%06u] failed, so bashing [%06u] to NOP\n", dspTreeID(call), dspTreeID(fgMorphStmt)); + // Detach the GT_CALL tree from the original statement by // hanging a "nothing" node to it. Later the "nothing" node will be removed // and the original GT_CALL tree will be picked up by the GT_RET_EXPR node. - noway_assert(fgMorphStmt->gtStmtExpr == call); fgMorphStmt->gtStmtExpr = gtNewNothingNode(); } - - // Clear the Inline Candidate flag so we can ensure later we tried - // inlining all candidates. - // - call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE; } } @@ -6682,6 +6699,13 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result) return; } + // Re-check this because guarded devirtualization may allow these through. + if (gtIsRecursiveCall(call) && call->IsImplicitTailCall()) + { + result->NoteFatal(InlineObservation::CALLSITE_IMPLICIT_REC_TAIL_CALL); + return; + } + // impMarkInlineCandidate() is expected not to mark tail prefixed calls // and recursive tail calls as inline candidates. noway_assert(!call->IsTailPrefixedCall());