Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

NinoFloris · 2019-03-13T18:33:12Z

After a benchmarking discussion in the FSSF slack we found the main culprit for the C# / F# speed differences. tail. was being emitted for a non recursive method, that turns off JIT inlining and other optimizations.

Repro can be found here.

https://sharplab.io/#v2:DYLgZgzgNALiCWwA+B7ADgUwHYAIDKAnhDBgLYCwAUOtvkSaQHQBKArljPKRowMIqk0iDACc8ogG7wAxhghUqMAphwAZFAHMNo1RgkZgOALw4AgsZwAGHEhwAhHBYCMOBZSUr1W0QAoAlMZUjo7AGDA43BAQAIbaEBaExGR8KMCh0pwoWBCMAOLYojKMqvDEADzEIvBYGgB8Pk6WTZZ+ANxBwR2OANplALJhABYoACYAkoLAPgMww+OTAPJomdmMploichDw+mNYwNXVGn61ALpdOKHhWOg4/hb+F9ykAEaiOLOlxZo+ofqGIDUmm0Il0/ygEQgGhwgMqRwCRguwQiW1icjWIxGPlIUL8SOCNzQj0oTzIbxEH0GXy8vz0BhhQO8oLpwAhKBeACsGQByaII/GOSIxOIYrHsjmMAAqKDwMCqNX8eMoyORhOJimUGBwkuiiC8IPuiOVwSuKKiaPiJkSDBSaQwGXgWRy+SwhWkxVKMAqcqO9UazTaF1Jr3enxyNL+9MB+p0LIhOOhsJ9NX5xpVZuF6NMmOxuLcyOe5Mp1J+kYBjJBYIMbM5PL5gTTKqFFtFPnFUplyY0iqoQA===

Another smaller codegen issue can be seen in the IL for ObjLog<'a>

            .locals init (
                [0] !!a
            )

            IL_0000: ldarg.0
            IL_0001: ldfld class [mscorlib]System.Collections.Generic.List`1<string> _/TailLogger::messages
            IL_0006: ldarg.2
            IL_0007: stloc.0
            IL_0008: ldloca.s 0
            IL_000a: constrained. !!a
            [...]

The C# IL for the same method completely does away with the arg to local dance and just uses the arg directly.

These extraneous load/store patterns confuse the JITs dataflow analysis whether things are being modified or not, hampering enregistration.

https://sharplab.io/#v2:C4LglgNgNAJiDUAfAAgBgATIIwBYDcAsAFDHIDM6ApgHYCuAtugDID2A5m5QE5OUBulCOmIBvYunQBBKOPQAhYgF9ipCsgBM8gIYBnSqw7dRsiQAcuYPluCVmYHcAA82VAD509Sjp1bOO9AC86NSUAO52Ds5YbgAU0agJAJSERBISJphqOMzsMQacPPyC6BBF0JjRHjpsiRliqWmNnt6+XgB0kjAwMfTVyRnKJA0S5JjZBo4AKq557AW8AkKli1Dok+gsAEYAVrXD6PWNTV4+fh1dMVvbbZMsAMrAFtRsMYn9+4OKQA=

If anything the focus should be on making sure tail. is only ever emitted for real recursive methods/functions

The text was updated successfully, but these errors were encountered:

EBrown8534 · 2019-03-13T21:07:25Z

As a note, this is the performance benchmark that was being used:

https://github.com/EBrown8534/Logging-Benchmark

Given a workaround from @NinoFloris the results looked as follows:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT


|                  Method |      Mean |      Error |     StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.14 us |  0.3343 us |  0.3127 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  33.86 us |  0.6621 us |  0.9496 us |  0.62 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  54.51 us |  1.0658 us |  1.6276 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  59.06 us |  1.1748 us |  2.4781 us |  1.08 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.44 us |  0.3609 us |  0.3200 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  50.51 us |  0.9964 us |  0.9320 us |  0.93 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 650.95 us |  6.5345 us |  5.7927 us | 11.95 |    0.42 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 682.77 us | 13.3395 us | 22.2873 us | 12.46 |    0.54 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 694.41 us | 13.6506 us | 13.4067 us | 12.76 |    0.53 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 715.00 us | 15.6588 us | 18.6406 us | 13.11 |    0.53 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 678.59 us | 13.1145 us | 13.4676 us | 12.47 |    0.56 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 689.77 us | 13.4700 us | 19.7442 us | 12.65 |    0.45 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  33.13 us |  0.3786 us |  0.3356 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  35.31 us |  0.7056 us |  1.2359 us |  0.65 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  34.65 us |  0.6788 us |  1.3239 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  35.03 us |  0.6944 us |  1.3543 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  53.60 us |  1.0659 us |  2.0020 us |  0.98 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  57.05 us |  1.1371 us |  3.2443 us |  1.08 |    0.07 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  81.92 us |  1.6557 us |  3.1099 us |  1.51 |    0.07 |     18.7988 |      3.0518 |           - |            78.21 KB |

However, pre-workaround we were seeing large disparities between C# and F# functions:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT


|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  32.95 us |  0.4434 us |  0.4148 us |  32.86 us |  0.61 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  53.02 us |  0.4130 us |  0.3864 us |  52.95 us |  0.98 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  54.23 us |  1.0741 us |  1.8527 us |  53.39 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  59.85 us |  0.4193 us |  0.3502 us |  59.91 us |  1.11 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  46.98 us |  0.3562 us |  0.2974 us |  46.96 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  58.36 us |  0.5470 us |  0.5116 us |  58.20 us |  1.07 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 641.96 us |  4.9715 us |  4.4071 us | 640.42 us | 11.85 |    0.38 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 688.81 us | 13.4879 us | 15.5327 us | 682.40 us | 12.60 |    0.60 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 686.75 us |  6.9659 us |  6.5159 us | 685.58 us | 12.63 |    0.46 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 707.65 us |  8.0404 us |  7.5210 us | 705.40 us | 13.01 |    0.42 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 676.84 us | 13.3201 us | 21.5095 us | 669.99 us | 12.45 |    0.47 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 676.51 us |  8.0330 us |  7.5141 us | 674.65 us | 12.44 |    0.42 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  32.70 us |  0.1734 us |  0.1353 us |  32.66 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  47.08 us |  0.4349 us |  0.4068 us |  47.03 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  32.95 us |  0.4549 us |  0.4255 us |  32.83 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  48.79 us |  0.9598 us |  1.6299 us |  48.11 us |  0.90 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  47.75 us |  1.0404 us |  1.2385 us |  47.08 us |  0.87 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  53.19 us |  0.3635 us |  0.3223 us |  53.14 us |  0.98 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  73.80 us |  0.8627 us |  0.7648 us |  73.76 us |  1.36 |    0.04 |     18.7988 |      3.0518 |           - |            78.21 KB |

You'll note, that in benchmark set 1, the LogDirect and LogDirectFs are almost identical: 33.14us vs 33.86us. The second benchmark is substantially different: 32.95us and 53.02us. The only difference was the following:

    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    let nop () = ()
    member this.Log(level : LoggerLevel, msg : string) =
        messages.Add(msg)
        nop()

For the first, and:

    member this.Log(level : LoggerLevel, msg : string) =
        messages.Add(msg)

For the second. (Each method was modified to include or exclude the nop as appropriate, but the only change was whether that is included or not.)

Across all the benchmarks this one difference changed execution times dramatically.

manofstick · 2019-03-17T23:24:35Z

This has been a bugbear of mine for many years! Mix the tail instruction with arguments with value type > 64 bits and performance is terrible (the > 64 bit thing is because arguments can't be passed by registers and so the JIT goes to a lot of effort to set up a stack frame that can be reused)

I think it was a mistake to be default insert tail.. We turn it off for all projects. For simple tail calls the fsharp compiler converts them into loops anyway, so its only really of use for mutually recursive functions which I assume for many code bases are the exception rather than the norm... For those cases the use of an attribute to output tail. would be preferable.

Here's a mention back from 4 years ago...

EBrown8534 · 2019-03-18T14:31:21Z

@manofstick How would I go about turning that off for a project?

I never have mutually-recursive functions, so it'd be great if I could turn this off because we have some F# code along very hot paths that I'd like to take advantage of this with, without needing to add a manual nop.

7sharp9 · 2019-03-18T15:25:44Z

--tailcalls-

manofstick · 2019-03-18T19:34:44Z

As well as fscs --tailcalls- argument mentioned by @7sharp9, if you're using Visual Studio (or at least using fsproj files), in the project Properties under Build is a checkbox "Generate tail calls" which by default is off for Debug and on for Release builds (which has the perverse affect of, under some conditions, meaning that your debug build runs faster!). Turning it off in Release modifies the .fsproj file by adding the <Tailcalls>false</Tailcalls> in the release PropertyGroup.

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'">
    ....
    <Tailcalls>false</Tailcalls>
</PropertyGroup>

NinoFloris · 2019-03-18T19:39:27Z

@manofstick problem is that these also get output during debug (see sharplab example, just switch to release and check bottom IL methods).

In this case the compiler is just wrong, there is nothing to tailcall optimize as there is no recursion

KevinRansom · 2019-03-18T19:55:43Z

@NinoFloris
If the jit generates less well optimized code in the presence of a .tail instruction, it seems like the performance change is a Jit Bug. .tail is documented as a hint to the runtime to try for tail optimizations.

However, it also appears that the compiler can help the Jit out by not generating unnecessary .tails.

Kevin

manofstick · 2019-03-18T20:00:39Z

Check out @mrange's spiel on tail calls here...

forki · 2019-03-18T20:10:47Z

@KevinRansom the problem is that some implementations of jit or even native compilation don't really care that much about tail. The reason is probably that C# is not really emitting it and that means they don't have real pressure. @dsyme had tried to make them aware of the problem, but I assume it's always the first thing that gets punted. So (together with always having fable in mind) I think we need to do two things. A) continue to beg native compilation team and jit team to invest into tail B) try to emit it only if really needed, because success of A) might be still miles away Paul Westcott <notifications@github.com> schrieb am Mo., 18. März 2019, 21:00:

…

Check out @mrange <https://github.com/mrange>'s spiel on tail calls here <https://gist.github.com/mrange/1f26f3bd219da4d407ed6dde56c461dc>... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6329 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADgNDv5ul6oTRQSRjjg8BI59B07o_5dks5vX_BngaJpZM4bvO4q> .

NinoFloris · 2019-03-18T20:18:37Z

@KevinRansom the jit has never ever worked well in the presence of .tail... Sharplab runs netfx, but how I came to know of tail woes was through coreclr issues like https://github.com/dotnet/coreclr/issues/18406.

But searching for "tail" in coreclr shows a ton of other issues like
https://github.com/dotnet/coreclr/issues/4971
https://github.com/dotnet/coreclr/issues/2556

And quite some closed issues where many are only fixed in upcoming 3.0

Search link
https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=is%3Aissue+tail+

Tail is a bugfarm and it'll take years before it'll be as fast as just doing callvirt nop instead of tail callvirt

mrange · 2019-03-18T20:45:03Z

It's not only that .tail turns off optimization, it also inserts a call to JIT_TailCall (or something like that) when calling a function on the tail. I suspect this function collapses the stackframe which is beneficial to not run out of stackspace but really poor for performance.

On rare occassions I have seen the jitter replace a call with jmp and I throw a party everytime! (I guess if all arguments can be passed in registers and there is no stack created).

I have upvoted issues, I have spammed chat on twitch during .NET community standups about giving .tail some love but I too have come to the conclusion that as long C# don't generate .tail we won't see any investment here. So:

Turn off tail call by default?
Make tail call generation suppressible in code for those performance sensitive cases?
Add .tail call generation to C# to create some demand to invest :)?
Change JIT to try to do tail call whenever it sees an opportunity creating a massive demand to invest in better tail calls ;)
Have a beer and forget about it?

PS: @manofstick now I am in AVX + Struct type performance hell. The jitter does lot of "funny" stuff :)

NinoFloris · 2019-03-18T20:48:39Z

@mrange yeah that helper is for "slowtailcall" and the jmp is when the jit decides "fasttailcall" is possible (which indeed is dependent on argument sizes but also Linux or arm architecture doesn't have any fast tail call support yet)

zpodlovics · 2019-03-19T08:20:13Z

@AndyAyersMS would it be possible to keep some/most of the optimization and inlining with .tail? Would AggressiveOptimization (dotnet/coreclr#20009) attribute help here?

AndyAyersMS · 2019-03-19T08:49:44Z

I'll have to read the notes above more carefully, but a few quick replies:

Jit_TailCall is generally only used when there isn't enough space on the caller's frame to accommodate the callee args. So you should only see it on x64 windows if callee has more than 4 args and also more args than caller. Not as sure about x86, it always has odd quirks. And yes x64 SysV support is missing, and there are limitations on arm64 too.
The jit is in a bit of a dilemma about deciding whether to honor explicit tail or inline -- if we inline we may blow the stack up unless we are careful. Today we never inline as we don't know how to be suitably careful. As dotnet/coreclr#18406 notes, this is something we can improve on in some cases. Aggressive opt is unrelated.
The jit will attempt to identify tail calls even without explicit tail, here inlining takes precedence.

7sharp9 · 2019-03-19T12:15:53Z

@forki Yeah the other implementations like Xamarin don't do anything with tailcalls, the arm compiler code thats generated by AOT just ignores it. When I worked for Xamarin we used to recommend that tailcalls were always disabled in a mobile solutions for that reason.

dsyme · 2019-03-19T12:54:12Z

@forki Yeah the other implementations like Xamarin don't do anything with tailcalls, the arm compiler code thats generated by AOT just ignores it.

My understanding this is inaccurate and out-of-date.

Please see mono/mono#6914 (comment), e.g. "pretty much everything is known to work". There were some subsequent patches/fixes but my understanding is that based on the work by @jaykrell the comment above is no longer accurate.

NinoFloris · 2019-03-19T13:04:30Z

@AndyAyersMS we did notice that even with adding noinline on the c# version it, the c# version, was still faster by quite a bit. It was only once we removed that explicit tail (and removed noinline on the c# method) that we got comparable results within noise range.

Are there more optimizations in the jit besides inlining that work less well or not at all in the presence of explicit tail?

AndyAyersMS · 2019-03-19T15:47:01Z

Not many, no -- there is a restriction on the unboxing optimization (see dotnet/coreclr#21890).

zpodlovics · 2019-03-19T16:08:18Z

@NinoFloris Check and compare the generated C# and F# IL. By default F# compiler will inline the small methods however this IL inlining is not always profitable due the more instruction, the JIT may able to make a better inlining decision. (#5178).

jaykrell · 2019-03-19T17:10:52Z

.tail is a dilema. On desktop, .tail can tailcall anything, sometimes a perf gain but sometimes a large perf loss. You sometimes have to chose stack or cycles.

NinoFloris · 2019-03-19T17:24:07Z

@zpodlovics I've obviously been looking at the IL ;) F# doesn't inline top level methods and functions AFAIK that only happens if they're marked inline. F# does do some optimizations for local functions and closures indeed.

dsyme · 2019-03-19T23:30:11Z

Something of interest is that the F# compiler does have logic to eliminate emit of "unnecessary" tailcall instructions even when --tailcalls is enabled. You can search for

MakesNoCriticalTailcalls
(for some reason this isn't showing the place where this information is consumed in IlxGen.fs
SetMakesNoCriticalTailcalls

In this particular case, the optimization is not triggered because the call to List.Add is a virtual call.

            IL_0007: tail.
            IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)

There are cases where we want to make tailcalls through virtual calls to .NET libraries - for example when implementing long deep calling control structures like IStructuralComparison. However this is not such a case (adding something to a list is never going to result in unbounded stack usage) and in the vast majority of cases we really don't need to make tailcalls to .NET virtual methods.

I think it would be quite reasonable to essentially somehow whitelist pretty much every virtual call in .NET core libraries apart from the IComparable/IEqualityComparer/IStructuralComparable/... APIs.

AndyAyersMS · 2019-03-21T16:11:07Z

I think we can improve how RyuJit handles the interaction of .tail calls and inlining, see dotnet/coreclr#23370. It is too late to get any changes from this into .Net Core 3.0, but addressing this in 3.1 seems quite possible, assuming we can develop reasonable heuristics. I also expect us to fix some of the tail call limitations we see on some ABIs in .Net Core 3.1.

And perhaps there are ways to reduce the need for cases that use the Jit_TailCall helper. The jit may be able to artificially increase the size of the root method's argument area via a special prolog so that fast tail calls to callees with more arg space than callers is possible. And, the inlining changes will be done so as not to increase the need for helper-assisted calls and inlining some tail calls may eliminate some cases that would have needed the helper.

dsyme · 2019-03-22T00:19:56Z

The jit may be able to artificially increase the size of the root method's argument area via a special prolog so that fast tail calls to callees with more arg space than callers is possible.

That sounds like a very important case, though I suppose to complete it would have to be transitive to some depth, e.g. A --> B --> C --> A --> B --> C ad infinitum, where A requires 10, B 20 and C 30.....

AndyAyersMS · 2019-03-23T07:44:39Z

it would have to be transitive to some depth

Since the jit is the one issuing the slow tail calls, it is not quite that bad. Once we're done inlining we know all the tail call candidate sites in the method, and we should be able to compute how much arg stack each needs, as well as what the method would normally provide. If there's a mismatch, we just need to boost the method's arg area size to the maximum needed by any of the calls, and make sure the prolog does the right magic.

If anyone's up for benchmarking a very early version of the jit changes, feel free to build from my fork on the InlineExplicitTailCalls branch. This doesn't have the right heuristic yet so it may be too aggressive. And it does not try and remove the need for helper assisted tail calls. Let me know if you run into issues with it, or have interesting results.

EBrown8534 · 2019-03-25T13:27:25Z

@AndyAyersMS How difficult would that be to setup? We have an extremely hot path of F# code in our website that I could spin another cluster node up for and load this version in. We already have timings between all nodes in the cluster, so at the very least I could give you a (relative) performance differential of real-world use-cases (at least, our real-world use-case). I.e. is it overall faster, slower, or no change.

AndyAyersMS · 2019-03-25T20:24:08Z

If you can run against the most recent 3.0 preview SDK, and can build the coreclr repo, then it should not be too bad. You can follow the Using Your Build instructions.

If you want something that works with 2.2 or 2.1 the changes in my branch should port over fairly cleanly. I can prop something up for this if you like but it might take a few days.

jdh30 · 2019-04-07T23:14:47Z

@NinoFloris
"tail. was being emitted for a non recursive method"
"If anything the focus should be on making sure tail. is only ever emitted for real recursive methods/functions"
"In this case the compiler is just wrong, there is nothing to tailcall optimize as there is no recursion"

Consider this code:

let f1 x = x()
let f2 x = f1 x
let rec f3() = f2 f3

Is that call from f2 to f1 a "recursive call"? I'm guessing not but if you don't honor that tail call then f3 will stack overflow.

NinoFloris · 2019-04-08T00:07:41Z

@jdh30 Yes it really only gets problematic when dynamic dispatch is involved.

#6329 (comment)

serefarikan · 2024-03-13T10:50:53Z

Hi, I arrived this issue via #13398 and even though most of the details go way over my head, I think the recommendation is to disable tail calls (optimisation, that is) . However, I was wondering if the [<TailCall>] introduced in .net 8 would allow me to disable tail call opt. while selectively enabling it, as @manofstick suggests above. Would this approach work?

vzarytovskii · 2024-03-13T10:56:04Z

Hi, I arrived this issue via #13398 and even though most of the details go way over my head, I think the recommendation is to disable tail calls (optimisation, that is) . However, I was wondering if the [<TailCall>] introduced in .net 8 would allow me to disable tail call opt. while selectively enabling it, as @manofstick suggests above. Would this approach work?

It won't, the attribute only enables tail recursion analysis, and has nothing to do with tail prefixes in IL.

cartermp added the Area-Compiler label Mar 18, 2019

cartermp added this to the Unknown milestone Mar 18, 2019

TIHan mentioned this issue Dec 9, 2019

Remove inline IL parsing method call #7947

Merged

AndyAyersMS mentioned this issue Jan 31, 2020

General inlining of tail calls dotnet/runtime#12304

Open

6 tasks

dsyme added Bug Tenet-Performance Feature Improvement and removed Bug labels Sep 1, 2020

cartermp modified the milestones: Feature requests and improvements, Backlog Oct 28, 2020

KevinRansom removed the Tenet-Performance label Jun 30, 2021

buybackoff mentioned this issue Jul 9, 2021

Attribute to prevent fsc inlining but allow JIT inlining fsharp/fslang-suggestions#838

Closed

5 tasks

dsyme added Area-Compiler-Optimization The F# optimizer, release code gen etc. and removed Area-Compiler labels Mar 31, 2022

vzarytovskii moved this to Not Planned in F# Compiler and Tooling Jun 17, 2022

vzarytovskii added this to F# Compiler and Tooling Jun 17, 2022

vzarytovskii mentioned this issue Oct 6, 2022

Tracking F# AOT, Trimmability, ILLink #13398

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

NinoFloris commented Mar 13, 2019 •

edited

Loading

EBrown8534 commented Mar 13, 2019

manofstick commented Mar 17, 2019

EBrown8534 commented Mar 18, 2019

7sharp9 commented Mar 18, 2019

manofstick commented Mar 18, 2019

NinoFloris commented Mar 18, 2019

KevinRansom commented Mar 18, 2019

manofstick commented Mar 18, 2019

forki commented Mar 18, 2019 via email

NinoFloris commented Mar 18, 2019

mrange commented Mar 18, 2019 •

edited

Loading

NinoFloris commented Mar 18, 2019 •

edited

Loading

zpodlovics commented Mar 19, 2019

AndyAyersMS commented Mar 19, 2019 •

edited

Loading

7sharp9 commented Mar 19, 2019 •

edited

Loading

dsyme commented Mar 19, 2019

NinoFloris commented Mar 19, 2019

AndyAyersMS commented Mar 19, 2019

zpodlovics commented Mar 19, 2019 •

edited

Loading

jaykrell commented Mar 19, 2019

NinoFloris commented Mar 19, 2019

dsyme commented Mar 19, 2019 •

edited

Loading

AndyAyersMS commented Mar 21, 2019

dsyme commented Mar 22, 2019

AndyAyersMS commented Mar 23, 2019

EBrown8534 commented Mar 25, 2019

AndyAyersMS commented Mar 25, 2019

jdh30 commented Apr 7, 2019

NinoFloris commented Apr 8, 2019

serefarikan commented Mar 13, 2024

vzarytovskii commented Mar 13, 2024

Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

Comments

NinoFloris commented Mar 13, 2019 • edited Loading

EBrown8534 commented Mar 13, 2019

manofstick commented Mar 17, 2019

EBrown8534 commented Mar 18, 2019

7sharp9 commented Mar 18, 2019

manofstick commented Mar 18, 2019

NinoFloris commented Mar 18, 2019

KevinRansom commented Mar 18, 2019

manofstick commented Mar 18, 2019

forki commented Mar 18, 2019 via email

NinoFloris commented Mar 18, 2019

mrange commented Mar 18, 2019 • edited Loading

NinoFloris commented Mar 18, 2019 • edited Loading

zpodlovics commented Mar 19, 2019

AndyAyersMS commented Mar 19, 2019 • edited Loading

7sharp9 commented Mar 19, 2019 • edited Loading

dsyme commented Mar 19, 2019

NinoFloris commented Mar 19, 2019

AndyAyersMS commented Mar 19, 2019

zpodlovics commented Mar 19, 2019 • edited Loading

jaykrell commented Mar 19, 2019

NinoFloris commented Mar 19, 2019

dsyme commented Mar 19, 2019 • edited Loading

AndyAyersMS commented Mar 21, 2019

dsyme commented Mar 22, 2019

AndyAyersMS commented Mar 23, 2019

EBrown8534 commented Mar 25, 2019

AndyAyersMS commented Mar 25, 2019

jdh30 commented Apr 7, 2019

NinoFloris commented Apr 8, 2019

serefarikan commented Mar 13, 2024

vzarytovskii commented Mar 13, 2024

NinoFloris commented Mar 13, 2019 •

edited

Loading

mrange commented Mar 18, 2019 •

edited

Loading

NinoFloris commented Mar 18, 2019 •

edited

Loading

AndyAyersMS commented Mar 19, 2019 •

edited

Loading

7sharp9 commented Mar 19, 2019 •

edited

Loading

zpodlovics commented Mar 19, 2019 •

edited

Loading

dsyme commented Mar 19, 2019 •

edited

Loading