-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make typedefof more efficient #5019
Comments
If you use I created a benchmark for that. If writing simple F# code like below it is 100+ time slower than C#, because the method equality uses F# GenericEquality and when I just want to generate 10 millions longs profiler reminds me about my old SO question. Also the ULong64 case in F# generates many GC0, which indicates that casting is not optimized away by JIT.
main
Main
Then I changed the code to
and found that the method is not inlined for some reason even though it has the attribute (I believe the issue on this attributed was merged long time ago) But adding
main
Main
But increasing the number of operations for more precise measurement reveals that F# does less operations per second in both cases and the difference to C# is similar to the difference between the two cases (they differ by ULong addition) so likely the if branch is not eliminated in F#. I do not want to disassembly and dig deeper for now, probably @adamsitnik could help to show how to use disassembly with BenchmarkDotNet. main
Main
The C# code is as simple/idiomatic as one could write, no special tricks are needed.
Update:
|
Hi, perhaps I misunderstand your meaning with this statement:
But F#'s It's not a replacement for C#'s I should mention regardless of this being changed I still probably wouldn't want to do reflection-based checks to return cached bool tasks in TaskBuilder.fs, just because it adds complexity and there would always be some overhead to it in the majority of cases where not returning bools. But this issue is a separate thing and I must admit there have been times when doing heavy reflection code in F# that I have (probably as an unnecessary premature optimization) chosen to save the result of |
@rspeele in the original issue my focus was on JIT compile-time constant and if branch elimination. If it worked then...
... there would be exactly zero overhead because JIT compiler just eliminates the wrong branches entirely (details here). I promised some benchmarks in the original issue and just posted it there. |
@buybackoff I guess this issue #526 still haunt us. I guess of the performance related issues comes from unwanted heap allocations (or even stack because of the excessive struct copy due the inability to pass struct's by reference,hopefully this will change with F# should have some kind of infrastructure (even better: constraints or even attributes something like the roslyn checker provides!) to check the heap / stack allocations - perferrably by the compiler. It it possible to calculate the heap / stack allocation based on the TAST and the generated IL? While for functional languages It's an active research area [1], even a basic or a Roslyn HeapAllocationAnalyzer like tool will help a lot due the quick feedback loop to fix existing F# problems like this issue (it also helps that we know it's possible to solve it, because C# already have these tricks): https://github.com/Microsoft/RoslynClrHeapAllocationAnalyzer Also take a look at this "constraint" for C# as "The proposal is to declare a local function as non-capturing. The keyword to indicate that would be static": [1] http://www.raml.co/ |
@buybackoff What happens when you use
Note that |
@dsyme yep, this removes the difference (and actually F# becomes slightly faster). Haven't looked into IL. But what is the deal with inlining here? Is F# compiler or JIT does the work when the |
The F# compiler does the work for |
@dsyme I formulated the question incorrectly. When F# inlines the method (I realize that this is like literally pasting IL code to a call site) then boxing disappears. Does F# compiler eliminate the |
I think the JIT will. F# doesn't. The inlining will also type-specialise the code and that might help the JIT.
No idea I'm afraid. I'd just be guessing. |
FYI: "Also, there are a number of reasons why methods with AggressiveInlining might not get inlined. If you can drop a checked clrjit.dll into your dotnet install, you can set COMPlus_JitPrintInlinedMethods to get a log of inlining decisions with failure reasons (this is less voluminous than a full jit dump)." https://github.com/dotnet/coreclr/issues/5996#issuecomment-228467831 There is an optimization pass to detect box + isinst: (https://github.com/dotnet/coreclr/issues/12877 + https://github.com/dotnet/coreclr/issues/14472) https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h#L3153 however I did not yet found a box + unbox optimization pass in compiler.h / compiler.cpp. |
@buybackoff last time I checked the if typeof<'T> = typeof then else if pattern created a somewhat horrible and the slowest code (fsharp/fslang-design#287 (comment)).
Generated IL code snippet (full namespace retracted) for typeof as match:
Generated IL code snippet (full namespace retracted) for benchmark:
|
@zpodlovics the point is that the same C# method is inlined by JIT and they use the same JIT... which probably means IL is different. But in such cases like this I prefer F# order, not a hint, to inline. There are not so many reasons when a simple method is not inlined - the most common is exception throwing, like in this case: The method is F#-intrinsic, but not JIT-intrisic, so it is a normal one and any |
"horrible and the slowest code" is this part: Try using what @dsyme showed above: |
@buybackoff I did not noticed that System.Type.op_Equality comment earlier, I will try it out. It may worth to reconsider the exception handling in F# code base expecially in standard library and move out the slow path to helpers (throwhelpers). This could shrink the fastpath number of ILs, that could also improve the JIT inlining decisions, but that's a different story / issue. |
Adjacent I tried looking at inlining decisions in your linked repro case but it appears that inlining of The jit in .Net Core 2.1 has a number of enhancements to type equality opts and boxing opts so you might also give that a try, if you're still using 2.0. |
This is a low-hanging fruit. Corefx uses this everywhere with interesting comments on why two nested method. Since @AndyAyersMS is here he could give authoritative answer if this pattern is worth it to change many LOCs Just looked into Array.fs and found that exceptions are inlined with F# |
@buybackoff Exactly, this why I mentioned. It's assumed to be a low-hanging fruit, but I am afraid it will be probably the most critical realization for F# performance related issues. It's not accidental that CoreCLR/CoreFX/Kestrel code refactored to that style. a) The most common execution path shrinked significantly (fastpath). The less code means lot more opportunity to inline (some slow code path could also prevent inlining due the complexity, size, etc) and the JIT will generate lot less code to execute that path. Inlining/NoInlining flags should be used wisely: https://github.com/dotnet/coreclr/issues/1270#issuecomment-180862050 The original reason why raise marked inline is ... to avoid warnings is code coverage tests (instead of fixing the code coverage tool). That was reasonably reasonable at that time, but the CoreCLR changed/improved significantly since then. Please see my late concerns about here: #3121 (comment) There are some legitimate case when inlining (at IL level by the compiler) exception throwing is required for example NoDynamicInvocation case here: https://github.com/dotnet/coreclr/issues/1270#issuecomment-181767435 |
@zpodlovics but that is tedious work without much fun. I try to use ThrowHelper by default after I created such helper for myself, but usually it makes sense when a profiler shows that e.g. method call takes more time than some numeric calculation. But at least in F# Core many throwing functions are reused from helper modules and it's easier to change code there than everywhere. |
@buybackoff I did the same in my code. Well, we can always introduce a an attribute for forced inline exception throwing (on raise) eg.: ForceInlineThrow (or something else), and the compiler do a transformation and generate helper static methods for raise code. But I am afraid this automatic tranformation may result a non-equivalent program and/or error prone (but the same could be true for inlining exception throwing). |
I would say in CoreCLR/CoreFX where're kind of in an ok but funny place. We have some number of methods that throw manually converted over to use throw helpers. But not all. We'd really prefer that this transformation be done automatically, behind the scenes (via say the ILLinker). But we don't have that yet. The pattern is:
The jit recognizes this pattern and does not inline the throw helper, and moves its call site to the "cold" section of the calling method's code (either end of method when jitting, or in a split off cold part when prejitting). Note that because of resource lookup, boxing, etc the formatting of the exception message can be a fairly large chunk of code (and the trend is to make these messages be more detailed). So we also get overall size wins when throw helpers can be reused across a number of different methods. Whether this is worth doing is hard to say -- if you have a number of frequently called methods that have conditional throws then it probably is helpful. You can be selective and only convert things gradually. |
Not re: the original topic, but where it meandered to... #5307 affects this (sprinkling breadcrumbs...) (Oh, and BTW, not sure if it was ever changed, but I remember looked many moons ago at AggressiveInlining in f#... From memory the MethodImplAttributes were handled as special cases, and AggressiveInlining were not passed through to the IL (probably the enum value was created after the fsharp compiler did that bit...), which might explain why it didn't work. From memory again I think I had a crack at adding it, and it wasn't particularly hard - not sure why I never posted a PR though... Hmmm. Maybe there was some issue...) |
@manofstick Last time I checked AggressiveInlining was fine in F#. The decompiled IL (with ildasm) have the [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
static member op_Equality(this : SOption<'a>, other : SOption<'a>) =
this.IsSome = other.IsSome && this.Value = other.Value
|
Any way to move this forward? @dsyme what are your thoughts on loading an open generic token like C# does by repurposing |
Couldn't we just add a specific optimization to generate the open |
That was the suggestion indeed! Without adding discards support it is a bit confusing you must fully specify the type yet you don't care about the arguments. Related to this is something I ran into a while back with a complex type; as typedefof is also just a normal generic function it must also respect the constraints on the type parameters involved. This can lead to awkward signatures where you must put a lot of emphasis on the arguments (think nonsense like |
I don't think it's actually a suggestion, so I'm filing it as a compiler issue.
As @rspeele mentions in comment to TaskBuilder.fs repo, currently
typedefof
looks rather inefficient. Compare the following C# and the following F# snippets:C# compiles to the following CIL:
While F# compiles to the following:
which is an equivalent of
Could we improve that? Are there any issues I miss with plain
ldtoken
that require the current approach?The text was updated successfully, but these errors were encountered: