Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization of indirect calls #44610

Open
AndyAyersMS opened this issue Nov 12, 2020 · 15 comments
Open

optimization of indirect calls #44610

AndyAyersMS opened this issue Nov 12, 2020 · 15 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

With the advent of function pointers we will now see an increased number of indirect call cases that the jit can optimize to direct calls. This requires a transformation similar to the one we do for devirtualization. There may be additional wrinkles, say if someone takes the address of an intrinsic method we would want intrinsic recognition to kick in too. So in the importer perhaps this needs to sit just upstream from impImportCall.

Overall this should be structured in a similar way to devirtualization -- opportunistically transforming calls during importation to enable subsequent inlining, and then perhaps retry later after inlining to at least remove overhead. Would be nice to be able to change over in the optimizer too, but that may prove challenging as we start to bake in many details in morph.

In principle we could do something similar with locally created delegates but there are a number of missing pieces that prevent us from seeing through from delegate creation to invocation.

using System;

class X
{
    static int F() => 33;
   
    public unsafe static int Main()
    {
        delegate*<int> f = &X.F;

        return f() + 67;
    }
}

produces

 Assembly listing for method X:Main():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 OutArgs      [V00    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V01 tmp1         [V01    ] (  0,  0   )    long  ->  zero-ref    "impImportIndirectCall"
;
; Lcl frame size = 40

G_M46779_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M46779_IG02:              ;; offset=0004H
       48B830F3FDD6FC7F0000 mov      rax, 0x7FFCD6FDF330
       FFD0                 call     rax
       83C043               add      eax, 67
                                                ;; bbWeight=1    PerfScore 3.50
G_M46779_IG03:              ;; offset=0013H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25

Not sure of the priority of this just yet, I am looking at some apps that use calli fairly heavily and need to better understand how many of these can be optimized. So marking as future for now.

This optimization will also intersect with guarded devirtualization / PGO, provided we can profile indirect targets and see biased distributions.

category:cq
theme:devirtualization
skill-level:expert
cost:medium

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 12, 2020
@AndyAyersMS AndyAyersMS added this to the Future milestone Nov 12, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 12, 2020
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Nov 12, 2020
@AndyAyersMS AndyAyersMS modified the milestones: Future, 7.0.0 Dec 17, 2021
@hez2010
Copy link
Contributor

hez2010 commented Dec 29, 2021

Since ASP.NET Core 6, delegates and lambdas are being heavily used in the new minimal-api design which uses delegates as endpoint handlers here and there. I believe if JIT can optimize delegates to direct calls, a huge performance improvement will be introduced.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Apr 12, 2022
Allow method handle histograms in .mibc files and in the PGO text
format.

Contributes to dotnet#44610.
jakobbotsch added a commit that referenced this issue May 3, 2022
Allow method handle histograms in .mibc files and in the PGO text
format.

Contributes to #44610.
@EgorBo EgorBo modified the milestones: 7.0.0, 8.0.0 Jul 6, 2022
@EgorBo EgorBo modified the milestones: 8.0.0, Future Mar 26, 2023
@MichalPetryka
Copy link
Contributor

MichalPetryka commented Apr 8, 2023

I'm working on this currently but I have a bit of an issue regarding invalid code handling with static readonly pointers, I'm using NonVirtualEntry2MethodDesc to resolve those to the method, this works fine with valid pointers and returns null for null, random pointer and pointer to unmanaged method correctly, but when I pass in validPtr + 1/validPtr - 1, it returns (from here) an invalid MethodDesc* which then AVs on read. On main this does not crash when JITting the code, but it crashes with my change. What's the best way to deal with this?
@jkotas @davidwrighton could you help here?

@jkotas
Copy link
Member

jkotas commented Apr 8, 2023

In general, runtime methods like NonVirtualEntry2MethodDesc does not expect bogus pointers.

You would need to create your own implementation of NonVirtualEntry2MethodDesc that does a precise validation of the function pointer. Also, once you get the pointer resolved, you would need to validate that the resolved method is a plausible target for the callsite. Consider cases like this:

static readonly void* pfn = ....;
static readonly PfnKind pfnKind = ....;

static void DoStuff()
{
    switch (pfnKind)
    {
    // pfn can be inlined only for the case that matches the actual pfn signature
    case MethodWithOneArgument: ((delegate* <int, void>)pfn)(arg1); break;
    case MethodWithTwoArguments: ((delegate* <int, void>)pfn)(arg1, arg2); break;
    case BogusMethod: return;
    }
}

@MichalPetryka
Copy link
Contributor

You would need to create your own implementation of NonVirtualEntry2MethodDesc that does a precise validation of the function pointer.

The issue is that it's not fully clear to me what's the proper way to validate those, do you have any hints with that?

Also, once you get the pointer resolved, you would need to validate that the resolved method is a plausible target for the callsite.

Yeah I do realise that, it's much less of an issue though since I have the signature of the calli and target method already, just have to compare those.

@jkotas
Copy link
Member

jkotas commented Apr 8, 2023

The issue is that it's not fully clear to me what's the proper way to validate those, do you have any hints with that?

Some examples:

  • (MethodDesc*)((FixupPrecode*)PCODEToPINSTR(entryPoint))->GetMethodDesc();: You would want to introduce new methods on class Precode that check whether entryPoint points to a valid spot withing the STUB_CODE_BLOCK_FIXUPPRECODE block.
  • if (pRS->_pjit->JitCodeToMethodInfo(pRS, entryPoint, &pMD, NULL)): Ask JitCodeToMethodInfo to compute EECodeInfo and use it to validate that the entryPoint actually points to start of the method and not into the middle of it.

@jkotas
Copy link
Member

jkotas commented Apr 10, 2023

Also, you will need to check whether the loader allocator of the resolved function pointer is in the dependency closure of the method being compiled. It is necessary to avoid introducing a dependency of non-collectible code on collectible code.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Apr 10, 2023

  • (MethodDesc*)((FixupPrecode*)PCODEToPINSTR(entryPoint))->GetMethodDesc();: You would want to introduce new methods on class Precode that check whether entryPoint points to a valid spot withing the STUB_CODE_BLOCK_FIXUPPRECODE block.

I've tried to do so by checking whether the pRS->_pRangeList->_starts contains the address but the array is apparently empty (edit, looking at the code, it seems like addresses are only added then in collectible assemblies; edit 2: even if I always add them, they're addresses to ranges, not starts of methods in them)? How am I supposed to check if the spot is valid then?

@jkotas
Copy link
Member

jkotas commented Apr 11, 2023

These data structures were not designed to allow validation of entry points. I think they would require redesign to allow it. It does not look easy.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Apr 11, 2023

These data structures were not designed to allow validation of entry points. I think they would require redesign to allow it. It does not look easy.

Is creating a global ftn ptr -> handle map for ldftns in cctors maybe a reasonable choice?

Or maybe we could just decide that storing invalid pointers to managed methods in static readonly fields is UB and can crash? Especially since the ECMA already says:

The ftn argument must be a method pointer to a method that can be legitimately called with the
arguments described by callsitedescr (a metadata token for a stand-alone signature). Such a
pointer can be created using the ldftn or ldvirtftn instructions, or could have been passed in from
native code. 

and since having a pointer that points to JIT code range but isn't valid already requires some weird invalid code, since null or pointers to unmanaged that are casted to managed will be fine here.

@jkotas
Copy link
Member

jkotas commented Apr 12, 2023

Is creating a global ftn ptr -> handle map for ldftns in cctors maybe a reasonable choice?

It does not sound very pay for play. This is niche optimization, vast majority of existing function pointer uses are not going to benefit from it, but they would still need to pay for it.

Or maybe we could just decide that storing invalid pointers to managed methods in static readonly fields is UB and can crash?

It would be a breaking change. Function pointers were invented first and foremost for managed C++. Does C++ define any of this as UB?

having a pointer that points to JIT code range but isn't valid already requires some weird invalid code

It is not that unusual as you may think. There is a defense-in-depth school of thought that encourages storing sensitive pointers (that includes code pointers) in statics in encrypted form. Here is a random code example that found after searching for 10 seconds on github: https://github.com/rituraj0/Old-Codes/blob/c2bc55bd94da84d404e6ed3be92b81ffcfbaa74f/Trace.cpp#L45

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Apr 12, 2023

Does C++ define any of this as UB?

It's implementation defined:

6.7.5
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

There's also:

Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.

There is a defense-in-depth school of thought that encourages storing sensitive pointers (that includes code pointers) in statics in encrypted form.

This wouldn't be an issue here since such obfuscation decodes such pointers on use, this issue only manifests when directly calling an invalid pointer, it just moves the AV from call time to JIT time. (This is an issue ONLY with code that'd AV if the call actually happend, now the call doesn't need to happen, just to be possible).

@MichalPetryka
Copy link
Contributor

And I guess being able to check whether a MethodDesc* the function returns points to a valid MethodDesc would be a solution, but I don't see how that could be checked?

@MichalPetryka
Copy link
Contributor

It's implementation defined

I've checked GCC, Clang and MSVC, they all seem to detect this, inline the valid case and leave the invalid one as is so that it AVs on call (same as dotnet right now).

@MichalPetryka
Copy link
Contributor

It does not sound very pay for play. This is niche optimization, vast majority of existing function pointer uses are not going to benefit from it, but they would still need to pay for it.

Hmm, the only solution I could think of that'd have small runtime overhead would be introducing a hidden byte/ushort/uint field with a bitmask for all static readonly managed ftn ptrs in the type and store a handle instead of the actual value in the field then instead, set a bit in the mask and then check the mask when in the helper for static readonly fields and spawn FTN_ADDR instead of CNS_INT there. But I feel like this would be too much to handle for me (and reflection would need to be handled too).

@jkotas
Copy link
Member

jkotas commented Apr 13, 2023

introducing a hidden byte/ushort/uint field with a bitmask for all static readonly managed ftn ptrs in the type

This would be between impossible and very complicated. #44610 (comment) should be a lot easier than this.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 22, 2023
MichalPetryka added a commit to MichalPetryka/runtime that referenced this issue May 1, 2023
Imports indirect calls as direct ones when the target method is known.

Only handles addresses from ldftn as the VM has no way to verify pointers
from static readonly fields and crashes on invalid ones.

The helpers currently contain a small dead path that I'll soon use when
extending the code to also handle delegates.

Opening as a draft so that the code can be reviewed while I finish the tests.

Fixes dotnet#44610
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Jun 4, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Development

Successfully merging a pull request may close this issue.

6 participants