-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not(isNull x)
leads to very odd and partially unreachable IL code that performace 5x slower than redefining not
yourself
#9433
Comments
not(isNull x)
leads to very odd and partially unreachable IL codenot(isNull x)
leads to very odd and partially unreachable IL code that performace 5x slower than redefining not
yourself
Yeah, this is pretty rough. Another good reason to rethink some of this custom IL emission stuff in FSharp.Core. // Learn more about F# at http://docs.microsoft.com/dotnet/fsharp
open System
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
module Op =
let inline not' x = match x with true -> false | false -> true
[<MemoryDiagnoser>]
type NotBench() =
let s = "hello"
[<Benchmark(Baseline=true)>]
member _.Not() = not (isNull s)
[<Benchmark>]
member _.NotWithMatch() = Op.not' (isNull s)
[<EntryPoint>]
let main argv =
let summary = BenchmarkRunner.Run<NotBench>()
printfn "%A" summary
0 // return an integer exit code BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.329 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.27801, CoreFX 5.0.20.27801), X64 RyuJIT DEBUG
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.27801, CoreFX 5.0.20.27801), X64 RyuJIT
|
Thanks for the quick response & fix! 0.0006ns? No machine is so fast ;). I think your benchmark ended up being optimized away (which by itself is good, it shows that the
I find that for very small measurements, to prevent his from happening, a little loop helps (while being cautious that the loop itself doesn't get erased). I got that tip from BDN w.r.t. to small CPU bound function performance. This doesn't even help, they are now both erased: [<Benchmark(Baseline=true)>]
member _.Not() =
for i=0 to 10000 do not (isNull s) |> ignore
[<Benchmark>]
member _.NotWithMatch() =
for i=0 to 10000 do Op.not' (isNull s) |> ignore But this works, the optimizer doesn't erase it anymore: [<Benchmark(Baseline=true)>]
member _.Not() =
let mutable b = false
for i=0 to 10000 do b <- not (isNull s)
b
[<Benchmark>]
member _.NotWithMatch() =
let mutable b = false
for i=0 to 10000 do b <- Op.not' (isNull s)
b This way you won't get absolute measurements of the kind like " If we add the following function, we can do a zero-measurement: [<Benchmark>]
member _.OnlyNull() =
let mutable b = false
for i=0 to 10000 do b <- isNull s
b Which shows that adding the optimized version of The overhead of the original
What is weird to me is the custom IL leads to the correct IL if compared to the |
Heh, good catch! Yeah, it gets optimized away. I wonder if this means that some user code will have things optimized away as well? |
Edit: corrected some info Okay, it appears that you don't need the big loop (since that could serve to just amortize the cost to be pretty much the same) of the check depending on how it runs on your machine. With some dummy code that prevents an optimization: module Op =
let inline not' x = if x then false else true
let inline not'' x = match x with true -> false | false -> true
[<MemoryDiagnoser>]
type NotBench() =
let s = "hello"
[<Benchmark(Baseline=true)>]
member _.Not() =
let mutable b = false
for i=0 to 1 do b <- not (isNull s)
b
[<Benchmark>]
member _.NotWithIf() =
let mutable b = false
for i=0 to 1 do b <- Op.not' (isNull s)
b
[<Benchmark>]
member _.NotWithMatch() =
let mutable b = false
for i=0 to 1 do b <- Op.not'' (isNull s)
b I get these timings against .NET 5:
And these for net48:
Some small variance from run to run, but the modified calls always seem better. More generally though, I think moving away from embedded inline IL when we can is always positive. |
This is really good analysis. Thank you for supplying all the numbers! Based on the info provided, we should consider just redefining I haven't looked into the behavior of the inline IL to determine if it's a simple bug. To me, we don't need the inline IL now since a basic |
The rather cool thing is that optimizer is now doing even more
Will Smith <notifications@github.com> schrieb am Mo., 15. Juni 2020, 14:55:
… This is really good analysis. Thank you for supplying all the numbers!
Based on the info provided, we should consider just redefining not to be
the match statement, or even an if: let inline not (value: bool) = if
value then false else true.
I haven't looked into the behavior of the inline IL to determine if it's a
simple bug. To me, we don't need the inline IL now since a basic match or
if version produces better IL and fixes the isNull perf.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9433 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAOANGKM62MCPAZ6TNA7WDRWYK43ANCNFSM4N4K3CIA>
.
|
But can someone please check that
not <| not <| not <| not <| true
will be rewritten to true in the IL
.
.
,.
C cc488thg f gg uh jgjoi g9
Steffen Forkmann <sforkmann@gmail.com> schrieb am Mo., 15. Juni 2020, 15:19:
… The rather cool thing is that optimizer is now doing even more
Will Smith ***@***.***> schrieb am Mo., 15. Juni 2020,
14:55:
> This is really good analysis. Thank you for supplying all the numbers!
>
> Based on the info provided, we should consider just redefining not to be
> the match statement, or even an if: let inline not (value: bool) = if
> value then false else true.
>
> I haven't looked into the behavior of the inline IL to determine if it's
> a simple bug. To me, we don't need the inline IL now since a basic match
> or if version produces better IL and fixes the isNull perf.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#9433 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAOANGKM62MCPAZ6TNA7WDRWYK43ANCNFSM4N4K3CIA>
> .
>
|
@forki, the IL doesn't matter in that case, the jitted assembly folds that in the presence of a constant. I found that out while doing measurements on multiple uses of |
it does matter in the context on further simplifcations |
@forki If you have: let mutable b = true
b <- not b
b <- not b
b <- not b
b It will appear in IL as written, but in disassembler it's const-folded. That's because RyuJit has the proper info that it's safe to do so. In some cases, appearance of This issue points to those cases where that didn't happen, namely where one of the chained comparisons compares to zero ( I'll have to check your specific case, but certainly right now it's not optimized if you end the sequence with |
@forki, if I fix that code to be And this: let mutable b = true
b <- not b
b <- not b
b <- not b
b Gets more complicate IL: https://sharplab.io/#v2:DYLgZgzgNAJiDUAfA9gBwKYDsAEBlAnhAC7oC2AsAFBH4bYDCAFAJTYC8V2X2pZARugBO2APoA6ALIt22Tt3nB0RHgFciAQz6LsfGUUEr0c+V10AeALTZMyZX2MnzVm3Yfyn12zrfc+QA=== But the resulting JIT disassembly is just:
Which is interesting, because if I try the same from C# it isn't folded. Hmm 🤔 |
I noticed this while doing timings for #9390, where sometimes using
not
gave an unexpected performance degradation. This boiled down tonot
sometimes leading to very unexpected IL.Repro steps
Take the following code snippet:
Because
not
is coded to return a singleIL
instruction withceq
, andisNull
, while coded withmatch
, also leads to basically a singleceq
instruction, that we'd end up with two instructions, or, after optimization, a single one. However, it blows up:Which gets translated in C# as:
If you were to recreate the
not
function as follows:The same code above would now be encoded in IL as:
And here is the real killer, if we encode
not
as itself, the problem also disappears, regardless of whether it is marked asinline
(the original) or not:Resulting IL:
Strangely, the
not
function itself looks exactly the same as thejustLikeNot
function above:Though in one case (with
isNull
) it leads to strange opcodes. In most other cases, it leads to the expected folding of theceq
into abrfalse
orbrtrue
respectively.More examples of coding this and their surprising translations can be found in this SharpLab.io snippet.
Expected behavior
Actual behavior
See above for the actual behavior. In terms of performance, the different
not
versions in the code perform all as expected, since they are ultimately folded into optimized x64 assembly, except for thenot(isNull x)
version. ThenotIsNull
below usesnot(isNull x)
, the others all use a different way of codingnot
than the default:(These timings were made by ensuring the function returns and is not optimized away (hence the
str.Length
call) and repeated 10_000x in a close for-loop to erase timing inefficiencies for micro-benchmarks with BDN.)This is ultimatedly caused by the final assembly, which looks as follows (note the popping and extra call):
Compare that to using one of the
not
redefinitions, which, with the same code, gives:That is: no push/pop of
rdi
andrsi
, that is, no new stackframe.Known workarounds
Redefine
not
yourself and the problem seems to disappear.Related information
I've only tested this on the latest VS + FSC (with optimizations on, of course), but the Sharplab decoding showed the same results.
I discussed this with @baronfel yesterday and neither of us could come up with a reasonable explanation, even more so since re-defining
not
as itself leads to optimized code, so I'm not sure why the combinationnot(isNull x)
leads to such IL. The Sharplab.io link shows that using something else thanisNull
in the brackets does not lead to the same weird IL opcodes.The text was updated successfully, but these errors were encountered: