Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map: Optimize away isinst check #10845

Merged
merged 3 commits into from
Jan 17, 2021
Merged

Map: Optimize away isinst check #10845

merged 3 commits into from
Jan 17, 2021

Conversation

buybackoff
Copy link
Contributor

@buybackoff buybackoff commented Jan 6, 2021

Following a discussion here #10768

Found a simple perf gain at the cost of 2 bytes per a Map item. A significant gain in getItem/contains/add due to replacement of isinst by int32 equality check. Benchmarked against current main.

The change is to store the height in leaves and share the field with nodes. Use it as an implicit tag (a leaf when height = 1) instead of the type check. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).

The tradeoff is basically the only thing to consider. Code changes are trivial.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Rank Gen 0 Gen 1 Gen 2 Allocated Code Size
getItem After LocalBuild 100 28.34 ns 0.291 ns 0.324 ns 1 - - - - 93 B
getItem Before Default 100 36.48 ns 0.206 ns 0.212 ns 2 - - - - 119 B
getItem After LocalBuild 10000 60.40 ns 0.365 ns 0.421 ns 3 - - - - 93 B
getItem Before Default 10000 82.42 ns 0.446 ns 0.513 ns 4 - - - - 119 B
containsKey After LocalBuild 100 24.95 ns 0.231 ns 0.266 ns 1 - - - - 155 B
containsKey Before Default 100 29.23 ns 0.385 ns 0.412 ns 2 - - - - 175 B
containsKey After LocalBuild 10000 46.19 ns 0.292 ns 0.337 ns 3 - - - - 155 B
containsKey Before Default 10000 53.00 ns 0.177 ns 0.196 ns 4 - - - - 175 B
itemCount After LocalBuild 100 179.55 ns 1.426 ns 1.643 ns 1 - - - - 81 B
itemCount Before Default 100 217.68 ns 1.816 ns 1.865 ns 2 - - - - 96 B
itemCount After LocalBuild 10000 33,454.41 ns 1,561.752 ns 1,798.516 ns 3 - - - - 81 B
itemCount Before Default 10000 34,357.14 ns 374.999 ns 431.850 ns 3 - - - - 96 B
iterForeach After LocalBuild 100 3,223.95 ns 23.071 ns 23.692 ns 2 1.0311 - - 6520 B 283 B
iterForeach Before Default 100 3,082.44 ns 28.212 ns 30.186 ns 1 0.9704 - - 6120 B 283 B
iterForeach After LocalBuild 10000 336,049.08 ns 1,375.830 ns 1,472.123 ns 4 101.0638 - - 640120 B 283 B
iterForeach Before Default 10000 317,291.41 ns 511.972 ns 525.758 ns 3 95.6633 - - 600120 B 283 B
addItem After LocalBuild 100 145.75 ns 0.811 ns 0.934 ns 1 0.0590 - - 374 B 601 B
addItem Before Default 100 170.09 ns 2.717 ns 2.907 ns 2 0.0584 - - 369 B 603 B
addItem After LocalBuild 10000 36,414.43 ns 415.002 ns 477.917 ns 3 11.0000 3.6250 - 69724 B 601 B
addItem Before Default 10000 37,677.58 ns 103.857 ns 111.126 ns 4 11.0000 3.2500 - 69324 B 603 B
removeItem After LocalBuild 100 12.33 ns 0.114 ns 0.112 ns 1 0.0064 - - 40 B 425 B
removeItem Before Default 100 12.22 ns 0.076 ns 0.081 ns 1 0.0064 - - 40 B 443 B
removeItem After LocalBuild 10000 1,189.82 ns 6.181 ns 7.118 ns 2 0.6345 - - 4000 B 425 B
removeItem Before Default 10000 1,188.84 ns 7.846 ns 8.720 ns 2 0.6345 - - 4000 B 443 B

@buybackoff buybackoff changed the title Map: Optimize away ininst check Map: Optimize away isinst check Jan 6, 2021
Store height in leaves. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node,
this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).
@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 7, 2021

Initially (in the benchmarks table) in the asNode function I used (# "" value: MapTreeNode<'Key,'Value> #) instead of value :?> MapTreeNode<'Key,'Value>. This is a replacement for Unsafe.As, discussed long time ago here.

But that fails in a bad way on Windows/full framework. The updated benchmarks are below.

error FS0193#L0
error FS0193(0,0): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Operation could destabilize the runtime.

There is still an improvement, but smaller. Making this a draft.

Is there a way to suppress the error for (# "" value: MapTreeNode<'Key,'Value> #) or achieve the same thing without a cast and without S.R.CS.Unsafe dependency?

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Rank Gen 0 Gen 1 Gen 2 Allocated Code Size
getItem After LocalBuild 100 34.16 ns 0.258 ns 0.297 ns 1 - - - - 93 B
getItem Before Default 100 36.51 ns 0.291 ns 0.312 ns 2 - - - - 119 B
getItem After LocalBuild 10000 75.90 ns 0.839 ns 0.933 ns 3 - - - - 93 B
getItem Before Default 10000 82.41 ns 0.306 ns 0.352 ns 4 - - - - 119 B
containsKey After LocalBuild 100 29.02 ns 0.310 ns 0.357 ns 1 - - - - 198 B
containsKey Before Default 100 28.70 ns 0.308 ns 0.355 ns 1 - - - - 175 B
containsKey After LocalBuild 10000 52.57 ns 0.453 ns 0.465 ns 2 - - - - 198 B
containsKey Before Default 10000 53.28 ns 0.423 ns 0.487 ns 2 - - - - 175 B
itemCount After LocalBuild 100 205.14 ns 1.642 ns 1.891 ns 1 - - - - 110 B
itemCount Before Default 100 216.20 ns 0.965 ns 1.072 ns 2 - - - - 96 B
itemCount After LocalBuild 10000 35,507.48 ns 934.860 ns 1,076.586 ns 4 - - - - 110 B
itemCount Before Default 10000 34,428.44 ns 534.973 ns 616.076 ns 3 - - - - 96 B
iterForeach After LocalBuild 100 3,139.77 ns 15.790 ns 18.184 ns 2 1.0354 - - 6520 B 283 B
iterForeach Before Default 100 3,065.72 ns 19.452 ns 22.401 ns 1 0.9659 - - 6120 B 283 B
iterForeach After LocalBuild 10000 324,242.95 ns 1,222.667 ns 1,408.025 ns 3 101.5625 - - 640120 B 283 B
iterForeach Before Default 10000 320,282.94 ns 1,422.972 ns 1,638.697 ns 3 95.6633 - - 600120 B 283 B
addItem After LocalBuild 100 151.75 ns 0.757 ns 0.810 ns 1 0.0591 - - 374 B 655 B
addItem Before Default 100 168.32 ns 0.436 ns 0.502 ns 2 0.0583 - - 369 B 603 B
addItem After LocalBuild 10000 37,558.66 ns 509.422 ns 566.221 ns 3 11.0000 3.6250 - 69724 B 655 B
addItem Before Default 10000 39,931.69 ns 349.741 ns 388.736 ns 4 10.9375 3.2813 - 69324 B 603 B
removeItem After LocalBuild 100 12.65 ns 0.100 ns 0.115 ns 2 0.0064 - - 40 B 454 B
removeItem Before Default 100 12.35 ns 0.147 ns 0.169 ns 1 0.0064 - - 40 B 443 B
removeItem After LocalBuild 10000 1,290.84 ns 21.354 ns 22.848 ns 4 0.6351 - - 4000 B 454 B
removeItem Before Default 10000 1,195.52 ns 6.200 ns 6.634 ns 3 0.6346 - - 4000 B 443 B

@buybackoff buybackoff marked this pull request as draft January 7, 2021 00:14
@buybackoff
Copy link
Contributor Author

Results for count = 10:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Rank Gen 0 Gen 1 Gen 2 Allocated Code Size
getItem After LocalBuild 10 17.92 ns 0.055 ns 0.057 ns 1 - - - - 93 B
getItem Before Default 10 36.07 ns 0.329 ns 0.379 ns 2 - - - - 119 B
containsKey After LocalBuild 10 16.03 ns 0.117 ns 0.134 ns 1 - - - - 198 B
containsKey Before Default 10 28.26 ns 0.241 ns 0.268 ns 2 - - - - 175 B
itemCount After LocalBuild 10 19.27 ns 0.170 ns 0.189 ns 1 - - - - 110 B
itemCount Before Default 10 217.36 ns 1.386 ns 1.540 ns 2 - - - - 96 B
iterForeach After LocalBuild 10 336.97 ns 0.816 ns 0.801 ns 1 0.1201 - - 760 B 283 B
iterForeach Before Default 10 3,045.81 ns 7.461 ns 8.293 ns 2 0.9686 - - 6120 B 283 B
addItem After LocalBuild 10 69.05 ns 0.414 ns 0.477 ns 1 0.0313 - - 198 B 655 B
addItem Before Default 10 172.04 ns 0.954 ns 1.061 ns 2 0.0587 - - 369 B 603 B
removeItem After LocalBuild 10 14.13 ns 0.060 ns 0.066 ns 2 0.0070 - - 44 B 454 B
removeItem Before Default 10 12.13 ns 0.043 ns 0.046 ns 1 0.0064 - - 40 B 443 B

@buybackoff
Copy link
Contributor Author

Oh, that was not due to the cast. Match on int field with just 2 cases, when the second case is else, generates a switch statement and additional sub instruction for it.

Or the effect of this is similar to removing cast. But will not test that if the unsafe cast is unverifiable and there is no workaround.

match m.Height with
    | 1 -> ...
    | _ -> ...

    IL_0006: ldarg.0      // m
    IL_0007: ldfld        int32 class Microsoft.FSharp.Collections.MapTree`2<!!0/*TKey*/, !!1/*TValue*/>::h
    IL_000c: ldc.i4.1
    IL_000d: sub
    IL_000e: switch       (IL_007d)

`Match` produces `sub 1` and `switch` instruction. Here, for any non-trivial count,
nodes are more frequent than leaves on the path, so branch prediction should be beneficial.
@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 7, 2021

Updated benchmarks after changing match to if.

The numbers for count = 10 are +/- same as above.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Rank Gen 0 Gen 1 Gen 2 Allocated Code Size
getItem After LocalBuild 100 28.65 ns 0.381 ns 0.424 ns 1 - - - - 93 B
getItem Before Default 100 36.39 ns 0.226 ns 0.251 ns 2 - - - - 119 B
getItem After LocalBuild 10000 61.33 ns 0.263 ns 0.293 ns 3 - - - - 93 B
getItem Before Default 10000 82.86 ns 0.129 ns 0.143 ns 4 - - - - 119 B
containsKey After LocalBuild 100 27.50 ns 0.256 ns 0.285 ns 1 - - - - 200 B
containsKey Before Default 100 29.29 ns 0.342 ns 0.380 ns 2 - - - - 175 B
containsKey After LocalBuild 10000 53.51 ns 0.289 ns 0.309 ns 3 - - - - 200 B
containsKey Before Default 10000 54.77 ns 0.341 ns 0.393 ns 4 - - - - 175 B
itemCount After LocalBuild 100 197.96 ns 1.155 ns 1.284 ns 1 - - - - 112 B
itemCount Before Default 100 217.04 ns 1.008 ns 1.079 ns 2 - - - - 96 B
itemCount After LocalBuild 10000 34,628.61 ns 651.610 ns 750.396 ns 3 - - - - 112 B
itemCount Before Default 10000 34,815.34 ns 1,108.432 ns 1,186.009 ns 3 - - - - 96 B
iterForeach After LocalBuild 100 3,090.08 ns 10.900 ns 11.663 ns 1 1.0373 - - 6520 B 283 B
iterForeach Before Default 100 3,057.11 ns 10.379 ns 11.537 ns 1 0.9728 - - 6120 B 283 B
iterForeach After LocalBuild 10000 324,417.64 ns 1,901.370 ns 2,189.621 ns 2 102.0408 - - 640120 B 283 B
iterForeach Before Default 10000 322,203.22 ns 1,951.415 ns 2,247.253 ns 2 95.6633 - - 600120 B 283 B
addItem After LocalBuild 100 153.87 ns 1.593 ns 1.834 ns 1 0.0593 - - 374 B 668 B
addItem Before Default 100 168.03 ns 0.663 ns 0.763 ns 2 0.0583 - - 369 B 603 B
addItem After LocalBuild 10000 39,308.48 ns 1,460.013 ns 1,562.198 ns 3 11.0938 3.7500 - 69724 B 668 B
addItem Before Default 10000 40,129.25 ns 179.695 ns 192.272 ns 3 10.9375 3.2813 - 69324 B 603 B
removeItem After LocalBuild 100 12.99 ns 0.067 ns 0.077 ns 2 0.0064 - - 40 B 461 B
removeItem Before Default 100 12.20 ns 0.074 ns 0.085 ns 1 0.0064 - - 40 B 443 B
removeItem After LocalBuild 10000 1,292.43 ns 16.367 ns 17.512 ns 4 0.6358 - - 4000 B 461 B
removeItem Before Default 10000 1,208.58 ns 6.424 ns 6.597 ns 3 0.6347 - - 4000 B 443 B

@buybackoff buybackoff marked this pull request as ready for review January 7, 2021 13:00
@vzarytovskii vzarytovskii reopened this Jan 7, 2021
@cartermp
Copy link
Contributor

cartermp commented Jan 7, 2021

This looks pretty nice! get/add/contains are also the most relevant operations for the F# compiler, and I suspect that's the case for most consumers as well. The one cause for concern is the error in the add operation. Much higher now and basically makes the improvement a wash. Any thoughts as to why that might be?

I expect that the same technique could be applied to Sets as well?

@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 7, 2021

This may be explained by short time of running the bench, while my machine was not idle. Or by GC. Need to rerun longer. But the number for add was stable between 3-4 runs.

@buybackoff
Copy link
Contributor Author

For the count 100, the improvement in add is 7+ sigma, isn't it?

@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 7, 2021

4x longer run (2x more iterations each 2x longer)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=40  
WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Rank Gen 0 Gen 1 Gen 2 Allocated Code Size
getItem After LocalBuild 100 28.99 ns 0.128 ns 0.228 ns 1 - - - - 93 B
getItem Before Default 100 36.21 ns 0.329 ns 0.366 ns 2 - - - - 119 B
getItem After LocalBuild 10000 64.13 ns 0.217 ns 0.374 ns 3 - - - - 93 B
getItem Before Default 10000 82.35 ns 0.223 ns 0.239 ns 4 - - - - 119 B
containsKey After LocalBuild 100 28.17 ns 0.103 ns 0.178 ns 1 - - - - 200 B
containsKey Before Default 100 28.50 ns 0.457 ns 0.526 ns 1 - - - - 175 B
containsKey After LocalBuild 10000 56.37 ns 0.189 ns 0.321 ns 3 - - - - 200 B
containsKey Before Default 10000 54.79 ns 0.646 ns 0.744 ns 2 - - - - 175 B
itemCount After LocalBuild 100 180.71 ns 0.866 ns 1.539 ns 1 - - - - 112 B
itemCount Before Default 100 218.13 ns 1.972 ns 2.192 ns 2 - - - - 96 B
itemCount After LocalBuild 10000 33,312.56 ns 753.070 ns 1,299.009 ns 3 - - - - 112 B
itemCount Before Default 10000 34,219.59 ns 459.480 ns 529.138 ns 4 - - - - 96 B
iterForeach After LocalBuild 100 3,087.15 ns 10.305 ns 17.217 ns 2 1.0376 - - 6520 B 283 B
iterForeach Before Default 100 3,046.73 ns 6.480 ns 7.202 ns 1 0.9754 - - 6120 B 283 B
iterForeach After LocalBuild 10000 325,384.57 ns 574.842 ns 1,006.789 ns 4 101.5625 - - 640120 B 283 B
iterForeach Before Default 10000 320,892.29 ns 919.239 ns 1,021.732 ns 3 95.6633 - - 600120 B 283 B
addItem After LocalBuild 100 154.49 ns 0.265 ns 0.470 ns 1 0.0595 0.0003 - 374 B 668 B
addItem Before Default 100 170.07 ns 0.702 ns 0.780 ns 2 0.0588 - - 369 B 603 B
addItem After LocalBuild 10000 37,704.46 ns 79.124 ns 136.485 ns 3 11.0938 3.4375 - 69724 B 668 B
addItem Before Default 10000 38,549.17 ns 189.526 ns 218.259 ns 4 11.0000 3.2500 - 69324 B 603 B
removeItem After LocalBuild 100 11.59 ns 0.045 ns 0.072 ns 1 0.0064 - - 40 B 461 B
removeItem Before Default 100 12.21 ns 0.113 ns 0.120 ns 2 0.0064 - - 40 B 443 B
removeItem After LocalBuild 10000 1,249.23 ns 4.061 ns 7.218 ns 4 0.6372 - - 4000 B 461 B
removeItem Before Default 10000 1,214.33 ns 6.080 ns 6.758 ns 3 0.6348 - - 4000 B 443 B

@cartermp
Copy link
Contributor

cartermp commented Jan 7, 2021

Thanks! I think that looks great.

Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, thanks, and thanks for all of the testing.

@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 7, 2021

Thanks!

I think there is still some "free lunch" with comparer devirtualization for at least primitive types. It surprises me that static readonly fields or member vals with get-only are not generated with the readonly modifier. From F# perspective they are readonly, and JIT optimizations for static readonly fields are important these days.

As for Set and complier Map/Set friends, an issue with up-for-grabs label would be nice. I may do this myself later, but do not know when.

@cartermp
Copy link
Contributor

cartermp commented Jan 7, 2021

No pressure :) - we can also apply your work to the other implementations if this gets merged soon

@buybackoff
Copy link
Contributor Author

buybackoff commented Jan 8, 2021

I've seen #9348, #513, and related. Such a hopeless state to optimize the comparison :(

Less than an hour tweaking the comparison gives result like below. But it's probably not 100% compatible with the current behavior. It could be if we apply the logic from #9348 snippet, to exclude records & DUs, if that code is correct.

If we take the current NuGet 5.0 release without any changes to Map as a baseline, are we ready to (Ratio column)

  • [Improve the map performance with] primitive keys by 3.65x (44% vs this PR) - getItem,
  • .. struct T : IComparable<T> by 4.14x (3.64x vs this PR) - getItemIntLike,
  • .. string by 52% (yet 10% slower than this PR) - getItemString,
  • but lose 15% for reference types and 17% for struct records - getItemRefLike and getItemIntRecord

?

I would say that everything could be wrapped by struct T : IComparable<T> + efficient logic there, and if someone uses ref-types as a key they do care about about performance by definition. But I do understand that regressing existing code by 15/17% may be too big. Yet the tradeoff is so great for primitive types.

The types in the bench are:

    [<StructuralEquality;CustomComparison>]
    type IntLike =
        struct
           val Value: int
           new(v:int) = {Value = v}
           member x.CompareTo(y:IntLike) = x.Value.CompareTo(y.Value)
        end
        
        interface IComparable<IntLike> with
            member x.CompareTo(y) = x.CompareTo(y)
            
        interface IComparable with
            member x.CompareTo(y) = x.CompareTo(y :?> IntLike) 


    type RefLike =
        val Value: int
        new(v:int) = {Value = v}
        member x.CompareTo(y:RefLike) = x.Value.CompareTo(y.Value)
        
        interface IComparable<RefLike> with
            member x.CompareTo(y) = x.CompareTo(y)
            
        interface IComparable with
            member x.CompareTo(y) = x.CompareTo(y :?> RefLike)
            
    [<Struct>]
    type IntRecord =
          { Value1 : int
            Value2 : int
          }

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host]  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After   : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Main50  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  NuGet50 : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=5  
IterationTime=250.0000 ms  WarmupCount=1  
Method Job BuildConfiguration Size Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
getItem After After 100 20.48 ns 0.935 ns 0.243 ns 1.00 0.00 1 - - - -
getItem Main50 Main50 100 29.40 ns 0.728 ns 0.189 ns 1.44 0.02 2 - - - -
getItem NuGet50 NuGet50 100 74.83 ns 6.092 ns 0.943 ns 3.65 0.08 3 - - - -
getItemIntLike After After 100 60.45 ns 0.460 ns 0.119 ns 1.00 0.00 1 - - - -
getItemIntLike Main50 Main50 100 220.00 ns 4.159 ns 0.644 ns 3.64 0.02 2 0.0450 - - 283 B
getItemIntLike NuGet50 NuGet50 100 250.34 ns 9.453 ns 1.463 ns 4.14 0.03 3 0.0449 - - 283 B
getItemString After After 100 76.03 ns 1.027 ns 0.267 ns 1.00 0.00 2 - - - -
getItemString Main50 Main50 100 68.22 ns 0.538 ns 0.083 ns 0.90 0.00 1 - - - -
getItemString NuGet50 NuGet50 100 115.77 ns 0.626 ns 0.162 ns 1.52 0.01 3 - - - -
getItemRefLike After After 100 278.41 ns 3.569 ns 0.927 ns 1.00 0.00 3 - - - -
getItemRefLike Main50 Main50 100 200.05 ns 6.346 ns 1.648 ns 0.72 0.01 1 - - - -
getItemRefLike NuGet50 NuGet50 100 236.40 ns 3.194 ns 0.494 ns 0.85 0.00 2 - - - -
getItemIntRecord After After 100 282.93 ns 12.166 ns 3.160 ns 1.00 0.00 3 - - - -
getItemIntRecord Main50 Main50 100 200.34 ns 6.643 ns 1.028 ns 0.71 0.01 1 - - - -
getItemIntRecord NuGet50 NuGet50 100 235.25 ns 1.148 ns 0.178 ns 0.83 0.01 2 - - - -

It should be probably a separate issue. But I'm not sure I would dig deeper if such tradeoff or breaking changes are not acceptable.

And again, static readonly fields would help. It's a total mess now. Not only such fields are not possible, beforefieldinit also feels random, and some weird init fields are appearing. I still could not understand what's going one with the code gen. I would like to embed multi-line IL directly if I may, beyond (#..#) things 🙄

buybackoff added a commit to buybackoff/fsharp that referenced this pull request Jan 10, 2021
@cartermp
Copy link
Contributor

@dsyme it would be good to get your eyes on this as well.

Note that the last set of benchmarks are not reflective of this change, they are a part of a discussion that manifests in #10855

Copy link
Member

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good,

thank you for this

@cartermp
Copy link
Contributor

I will merge this in. Thanks @buybackoff

@cartermp cartermp merged commit 15654d2 into dotnet:main Jan 17, 2021
@buybackoff
Copy link
Contributor Author

@cartermp

I will merge this in. Thanks @buybackoff

Thanks!

It would be interesting to know if comparer changes or even the direction have any chance? It looks like there are huge easy gains for 95+% cases, but they are blocked by the remaining <5% mostly edge cases. I've noticed you are going to add S.C.Immutable dependency for immutable arrays, but (hypothetically) replacing MapTree with AVL implementation from there will require exactly the same transition from F#'s comparison constraint to S.C.G.Comparable<T>.Default for efficient inlined comparer calls.

@cartermp
Copy link
Contributor

@buybackoff we'll take a look and have a think about them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants