Map: Optimize away `isinst` check #10845

buybackoff · 2021-01-06T23:46:55Z

Following a discussion here #10768

Found a simple perf gain at the cost of 2 bytes per a Map item. A significant gain in getItem/contains/add due to replacement of isinst by int32 equality check. Benchmarked against current main.

The change is to store the height in leaves and share the field with nodes. Use it as an implicit tag (a leaf when height = 1) instead of the type check. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).

The tradeoff is basically the only thing to consider. Code changes are trivial.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated	Code Size
getItem	After	LocalBuild	100	28.34 ns	0.291 ns	0.324 ns	1	-	-	-	-	93 B
getItem	Before	Default	100	36.48 ns	0.206 ns	0.212 ns	2	-	-	-	-	119 B
getItem	After	LocalBuild	10000	60.40 ns	0.365 ns	0.421 ns	3	-	-	-	-	93 B
getItem	Before	Default	10000	82.42 ns	0.446 ns	0.513 ns	4	-	-	-	-	119 B

containsKey	After	LocalBuild	100	24.95 ns	0.231 ns	0.266 ns	1	-	-	-	-	155 B
containsKey	Before	Default	100	29.23 ns	0.385 ns	0.412 ns	2	-	-	-	-	175 B
containsKey	After	LocalBuild	10000	46.19 ns	0.292 ns	0.337 ns	3	-	-	-	-	155 B
containsKey	Before	Default	10000	53.00 ns	0.177 ns	0.196 ns	4	-	-	-	-	175 B

itemCount	After	LocalBuild	100	179.55 ns	1.426 ns	1.643 ns	1	-	-	-	-	81 B
itemCount	Before	Default	100	217.68 ns	1.816 ns	1.865 ns	2	-	-	-	-	96 B
itemCount	After	LocalBuild	10000	33,454.41 ns	1,561.752 ns	1,798.516 ns	3	-	-	-	-	81 B
itemCount	Before	Default	10000	34,357.14 ns	374.999 ns	431.850 ns	3	-	-	-	-	96 B

iterForeach	After	LocalBuild	100	3,223.95 ns	23.071 ns	23.692 ns	2	1.0311	-	-	6520 B	283 B
iterForeach	Before	Default	100	3,082.44 ns	28.212 ns	30.186 ns	1	0.9704	-	-	6120 B	283 B
iterForeach	After	LocalBuild	10000	336,049.08 ns	1,375.830 ns	1,472.123 ns	4	101.0638	-	-	640120 B	283 B
iterForeach	Before	Default	10000	317,291.41 ns	511.972 ns	525.758 ns	3	95.6633	-	-	600120 B	283 B

addItem	After	LocalBuild	100	145.75 ns	0.811 ns	0.934 ns	1	0.0590	-	-	374 B	601 B
addItem	Before	Default	100	170.09 ns	2.717 ns	2.907 ns	2	0.0584	-	-	369 B	603 B
addItem	After	LocalBuild	10000	36,414.43 ns	415.002 ns	477.917 ns	3	11.0000	3.6250	-	69724 B	601 B
addItem	Before	Default	10000	37,677.58 ns	103.857 ns	111.126 ns	4	11.0000	3.2500	-	69324 B	603 B

removeItem	After	LocalBuild	100	12.33 ns	0.114 ns	0.112 ns	1	0.0064	-	-	40 B	425 B
removeItem	Before	Default	100	12.22 ns	0.076 ns	0.081 ns	1	0.0064	-	-	40 B	443 B
removeItem	After	LocalBuild	10000	1,189.82 ns	6.181 ns	7.118 ns	2	0.6345	-	-	4000 B	425 B
removeItem	Before	Default	10000	1,188.84 ns	7.846 ns	8.720 ns	2	0.6345	-	-	4000 B	443 B

Store height in leaves. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).

buybackoff · 2021-01-07T00:14:19Z

Initially (in the benchmarks table) in the asNode function I used (# "" value: MapTreeNode<'Key,'Value> #) instead of value :?> MapTreeNode<'Key,'Value>. This is a replacement for Unsafe.As, discussed long time ago here.

But that fails in a bad way on Windows/full framework. The updated benchmarks are below.

error FS0193#L0
error FS0193(0,0): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Operation could destabilize the runtime.

There is still an improvement, but smaller. Making this a draft.

Is there a way to suppress the error for (# "" value: MapTreeNode<'Key,'Value> #) or achieve the same thing without a cast and without S.R.CS.Unsafe dependency?

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated	Code Size
getItem	After	LocalBuild	100	34.16 ns	0.258 ns	0.297 ns	1	-	-	-	-	93 B
getItem	Before	Default	100	36.51 ns	0.291 ns	0.312 ns	2	-	-	-	-	119 B
getItem	After	LocalBuild	10000	75.90 ns	0.839 ns	0.933 ns	3	-	-	-	-	93 B
getItem	Before	Default	10000	82.41 ns	0.306 ns	0.352 ns	4	-	-	-	-	119 B

containsKey	After	LocalBuild	100	29.02 ns	0.310 ns	0.357 ns	1	-	-	-	-	198 B
containsKey	Before	Default	100	28.70 ns	0.308 ns	0.355 ns	1	-	-	-	-	175 B
containsKey	After	LocalBuild	10000	52.57 ns	0.453 ns	0.465 ns	2	-	-	-	-	198 B
containsKey	Before	Default	10000	53.28 ns	0.423 ns	0.487 ns	2	-	-	-	-	175 B

itemCount	After	LocalBuild	100	205.14 ns	1.642 ns	1.891 ns	1	-	-	-	-	110 B
itemCount	Before	Default	100	216.20 ns	0.965 ns	1.072 ns	2	-	-	-	-	96 B
itemCount	After	LocalBuild	10000	35,507.48 ns	934.860 ns	1,076.586 ns	4	-	-	-	-	110 B
itemCount	Before	Default	10000	34,428.44 ns	534.973 ns	616.076 ns	3	-	-	-	-	96 B

iterForeach	After	LocalBuild	100	3,139.77 ns	15.790 ns	18.184 ns	2	1.0354	-	-	6520 B	283 B
iterForeach	Before	Default	100	3,065.72 ns	19.452 ns	22.401 ns	1	0.9659	-	-	6120 B	283 B
iterForeach	After	LocalBuild	10000	324,242.95 ns	1,222.667 ns	1,408.025 ns	3	101.5625	-	-	640120 B	283 B
iterForeach	Before	Default	10000	320,282.94 ns	1,422.972 ns	1,638.697 ns	3	95.6633	-	-	600120 B	283 B

addItem	After	LocalBuild	100	151.75 ns	0.757 ns	0.810 ns	1	0.0591	-	-	374 B	655 B
addItem	Before	Default	100	168.32 ns	0.436 ns	0.502 ns	2	0.0583	-	-	369 B	603 B
addItem	After	LocalBuild	10000	37,558.66 ns	509.422 ns	566.221 ns	3	11.0000	3.6250	-	69724 B	655 B
addItem	Before	Default	10000	39,931.69 ns	349.741 ns	388.736 ns	4	10.9375	3.2813	-	69324 B	603 B

removeItem	After	LocalBuild	100	12.65 ns	0.100 ns	0.115 ns	2	0.0064	-	-	40 B	454 B
removeItem	Before	Default	100	12.35 ns	0.147 ns	0.169 ns	1	0.0064	-	-	40 B	443 B
removeItem	After	LocalBuild	10000	1,290.84 ns	21.354 ns	22.848 ns	4	0.6351	-	-	4000 B	454 B
removeItem	Before	Default	10000	1,195.52 ns	6.200 ns	6.634 ns	3	0.6346	-	-	4000 B	443 B

buybackoff · 2021-01-07T00:19:10Z

Results for count = 10:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated	Code Size
getItem	After	LocalBuild	10	17.92 ns	0.055 ns	0.057 ns	1	-	-	-	-	93 B
getItem	Before	Default	10	36.07 ns	0.329 ns	0.379 ns	2	-	-	-	-	119 B

containsKey	After	LocalBuild	10	16.03 ns	0.117 ns	0.134 ns	1	-	-	-	-	198 B
containsKey	Before	Default	10	28.26 ns	0.241 ns	0.268 ns	2	-	-	-	-	175 B

itemCount	After	LocalBuild	10	19.27 ns	0.170 ns	0.189 ns	1	-	-	-	-	110 B
itemCount	Before	Default	10	217.36 ns	1.386 ns	1.540 ns	2	-	-	-	-	96 B

iterForeach	After	LocalBuild	10	336.97 ns	0.816 ns	0.801 ns	1	0.1201	-	-	760 B	283 B
iterForeach	Before	Default	10	3,045.81 ns	7.461 ns	8.293 ns	2	0.9686	-	-	6120 B	283 B

addItem	After	LocalBuild	10	69.05 ns	0.414 ns	0.477 ns	1	0.0313	-	-	198 B	655 B
addItem	Before	Default	10	172.04 ns	0.954 ns	1.061 ns	2	0.0587	-	-	369 B	603 B

removeItem	After	LocalBuild	10	14.13 ns	0.060 ns	0.066 ns	2	0.0070	-	-	44 B	454 B
removeItem	Before	Default	10	12.13 ns	0.043 ns	0.046 ns	1	0.0064	-	-	40 B	443 B

buybackoff · 2021-01-07T12:14:45Z

Oh, that was not due to the cast. Match on int field with just 2 cases, when the second case is else, generates a switch statement and additional sub instruction for it.

Or the effect of this is similar to removing cast. But will not test that if the unsafe cast is unverifiable and there is no workaround.

match m.Height with
    | 1 -> ...
    | _ -> ...

    IL_0006: ldarg.0      // m
    IL_0007: ldfld        int32 class Microsoft.FSharp.Collections.MapTree`2<!!0/*TKey*/, !!1/*TValue*/>::h
    IL_000c: ldc.i4.1
    IL_000d: sub
    IL_000e: switch       (IL_007d)

`Match` produces `sub 1` and `switch` instruction. Here, for any non-trivial count, nodes are more frequent than leaves on the path, so branch prediction should be beneficial.

buybackoff · 2021-01-07T12:57:09Z

Updated benchmarks after changing match to if.

The numbers for count = 10 are +/- same as above.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=20  
IterationTime=250.0000 ms  WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated	Code Size
getItem	After	LocalBuild	100	28.65 ns	0.381 ns	0.424 ns	1	-	-	-	-	93 B
getItem	Before	Default	100	36.39 ns	0.226 ns	0.251 ns	2	-	-	-	-	119 B
getItem	After	LocalBuild	10000	61.33 ns	0.263 ns	0.293 ns	3	-	-	-	-	93 B
getItem	Before	Default	10000	82.86 ns	0.129 ns	0.143 ns	4	-	-	-	-	119 B

containsKey	After	LocalBuild	100	27.50 ns	0.256 ns	0.285 ns	1	-	-	-	-	200 B
containsKey	Before	Default	100	29.29 ns	0.342 ns	0.380 ns	2	-	-	-	-	175 B
containsKey	After	LocalBuild	10000	53.51 ns	0.289 ns	0.309 ns	3	-	-	-	-	200 B
containsKey	Before	Default	10000	54.77 ns	0.341 ns	0.393 ns	4	-	-	-	-	175 B

itemCount	After	LocalBuild	100	197.96 ns	1.155 ns	1.284 ns	1	-	-	-	-	112 B
itemCount	Before	Default	100	217.04 ns	1.008 ns	1.079 ns	2	-	-	-	-	96 B
itemCount	After	LocalBuild	10000	34,628.61 ns	651.610 ns	750.396 ns	3	-	-	-	-	112 B
itemCount	Before	Default	10000	34,815.34 ns	1,108.432 ns	1,186.009 ns	3	-	-	-	-	96 B

iterForeach	After	LocalBuild	100	3,090.08 ns	10.900 ns	11.663 ns	1	1.0373	-	-	6520 B	283 B
iterForeach	Before	Default	100	3,057.11 ns	10.379 ns	11.537 ns	1	0.9728	-	-	6120 B	283 B
iterForeach	After	LocalBuild	10000	324,417.64 ns	1,901.370 ns	2,189.621 ns	2	102.0408	-	-	640120 B	283 B
iterForeach	Before	Default	10000	322,203.22 ns	1,951.415 ns	2,247.253 ns	2	95.6633	-	-	600120 B	283 B

addItem	After	LocalBuild	100	153.87 ns	1.593 ns	1.834 ns	1	0.0593	-	-	374 B	668 B
addItem	Before	Default	100	168.03 ns	0.663 ns	0.763 ns	2	0.0583	-	-	369 B	603 B
addItem	After	LocalBuild	10000	39,308.48 ns	1,460.013 ns	1,562.198 ns	3	11.0938	3.7500	-	69724 B	668 B
addItem	Before	Default	10000	40,129.25 ns	179.695 ns	192.272 ns	3	10.9375	3.2813	-	69324 B	603 B

removeItem	After	LocalBuild	100	12.99 ns	0.067 ns	0.077 ns	2	0.0064	-	-	40 B	461 B
removeItem	Before	Default	100	12.20 ns	0.074 ns	0.085 ns	1	0.0064	-	-	40 B	443 B
removeItem	After	LocalBuild	10000	1,292.43 ns	16.367 ns	17.512 ns	4	0.6358	-	-	4000 B	461 B
removeItem	Before	Default	10000	1,208.58 ns	6.424 ns	6.597 ns	3	0.6347	-	-	4000 B	443 B

cartermp · 2021-01-07T17:59:15Z

This looks pretty nice! get/add/contains are also the most relevant operations for the F# compiler, and I suspect that's the case for most consumers as well. The one cause for concern is the error in the add operation. Much higher now and basically makes the improvement a wash. Any thoughts as to why that might be?

I expect that the same technique could be applied to Sets as well?

buybackoff · 2021-01-07T18:47:14Z

This may be explained by short time of running the bench, while my machine was not idle. Or by GC. Need to rerun longer. But the number for add was stable between 3-4 runs.

buybackoff · 2021-01-07T19:01:42Z

For the count 100, the improvement in add is 7+ sigma, isn't it?

buybackoff · 2021-01-07T19:52:45Z

4x longer run (2x more iterations each 2x longer)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=40  
WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated	Code Size
getItem	After	LocalBuild	100	28.99 ns	0.128 ns	0.228 ns	1	-	-	-	-	93 B
getItem	Before	Default	100	36.21 ns	0.329 ns	0.366 ns	2	-	-	-	-	119 B
getItem	After	LocalBuild	10000	64.13 ns	0.217 ns	0.374 ns	3	-	-	-	-	93 B
getItem	Before	Default	10000	82.35 ns	0.223 ns	0.239 ns	4	-	-	-	-	119 B

containsKey	After	LocalBuild	100	28.17 ns	0.103 ns	0.178 ns	1	-	-	-	-	200 B
containsKey	Before	Default	100	28.50 ns	0.457 ns	0.526 ns	1	-	-	-	-	175 B
containsKey	After	LocalBuild	10000	56.37 ns	0.189 ns	0.321 ns	3	-	-	-	-	200 B
containsKey	Before	Default	10000	54.79 ns	0.646 ns	0.744 ns	2	-	-	-	-	175 B

itemCount	After	LocalBuild	100	180.71 ns	0.866 ns	1.539 ns	1	-	-	-	-	112 B
itemCount	Before	Default	100	218.13 ns	1.972 ns	2.192 ns	2	-	-	-	-	96 B
itemCount	After	LocalBuild	10000	33,312.56 ns	753.070 ns	1,299.009 ns	3	-	-	-	-	112 B
itemCount	Before	Default	10000	34,219.59 ns	459.480 ns	529.138 ns	4	-	-	-	-	96 B

iterForeach	After	LocalBuild	100	3,087.15 ns	10.305 ns	17.217 ns	2	1.0376	-	-	6520 B	283 B
iterForeach	Before	Default	100	3,046.73 ns	6.480 ns	7.202 ns	1	0.9754	-	-	6120 B	283 B
iterForeach	After	LocalBuild	10000	325,384.57 ns	574.842 ns	1,006.789 ns	4	101.5625	-	-	640120 B	283 B
iterForeach	Before	Default	10000	320,892.29 ns	919.239 ns	1,021.732 ns	3	95.6633	-	-	600120 B	283 B

addItem	After	LocalBuild	100	154.49 ns	0.265 ns	0.470 ns	1	0.0595	0.0003	-	374 B	668 B
addItem	Before	Default	100	170.07 ns	0.702 ns	0.780 ns	2	0.0588	-	-	369 B	603 B
addItem	After	LocalBuild	10000	37,704.46 ns	79.124 ns	136.485 ns	3	11.0938	3.4375	-	69724 B	668 B
addItem	Before	Default	10000	38,549.17 ns	189.526 ns	218.259 ns	4	11.0000	3.2500	-	69324 B	603 B

removeItem	After	LocalBuild	100	11.59 ns	0.045 ns	0.072 ns	1	0.0064	-	-	40 B	461 B
removeItem	Before	Default	100	12.21 ns	0.113 ns	0.120 ns	2	0.0064	-	-	40 B	443 B
removeItem	After	LocalBuild	10000	1,249.23 ns	4.061 ns	7.218 ns	4	0.6372	-	-	4000 B	461 B
removeItem	Before	Default	10000	1,214.33 ns	6.080 ns	6.758 ns	3	0.6348	-	-	4000 B	443 B

cartermp · 2021-01-07T19:55:23Z

Thanks! I think that looks great.

cartermp

I think this is a good change, thanks, and thanks for all of the testing.

buybackoff · 2021-01-07T20:12:55Z

Thanks!

I think there is still some "free lunch" with comparer devirtualization for at least primitive types. It surprises me that static readonly fields or member vals with get-only are not generated with the readonly modifier. From F# perspective they are readonly, and JIT optimizations for static readonly fields are important these days.

As for Set and complier Map/Set friends, an issue with up-for-grabs label would be nice. I may do this myself later, but do not know when.

cartermp · 2021-01-07T20:22:41Z

No pressure :) - we can also apply your work to the other implementations if this gets merged soon

buybackoff · 2021-01-08T21:08:33Z

I've seen #9348, #513, and related. Such a hopeless state to optimize the comparison :(

Less than an hour tweaking the comparison gives result like below. But it's probably not 100% compatible with the current behavior. It could be if we apply the logic from #9348 snippet, to exclude records & DUs, if that code is correct.

If we take the current NuGet 5.0 release without any changes to Map as a baseline, are we ready to (Ratio column)

[Improve the map performance with] primitive keys by 3.65x (44% vs this PR) - getItem,
.. struct T : IComparable<T> by 4.14x (3.64x vs this PR) - getItemIntLike,
.. string by 52% (yet 10% slower than this PR) - getItemString,
but lose 15% for reference types and 17% for struct records - getItemRefLike and getItemIntRecord

?

I would say that everything could be wrapped by struct T : IComparable<T> + efficient logic there, and if someone uses ref-types as a key they do care about about performance by definition. But I do understand that regressing existing code by 15/17% may be too big. Yet the tradeoff is so great for primitive types.

The types in the bench are:

    [<StructuralEquality;CustomComparison>]
    type IntLike =
        struct
           val Value: int
           new(v:int) = {Value = v}
           member x.CompareTo(y:IntLike) = x.Value.CompareTo(y.Value)
        end
        
        interface IComparable<IntLike> with
            member x.CompareTo(y) = x.CompareTo(y)
            
        interface IComparable with
            member x.CompareTo(y) = x.CompareTo(y :?> IntLike) 


    type RefLike =
        val Value: int
        new(v:int) = {Value = v}
        member x.CompareTo(y:RefLike) = x.Value.CompareTo(y.Value)
        
        interface IComparable<RefLike> with
            member x.CompareTo(y) = x.CompareTo(y)
            
        interface IComparable with
            member x.CompareTo(y) = x.CompareTo(y :?> RefLike)
            
    [<Struct>]
    type IntRecord =
          { Value1 : int
            Value2 : int
          }

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host]  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
  After   : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  Main50  : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  NuGet50 : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT

MaxRelativeError=0.01  Arguments=/p:Optimize=true  IterationCount=5  
IterationTime=250.0000 ms  WarmupCount=1

Method	Job	BuildConfiguration	Size	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
getItem	After	After	100	20.48 ns	0.935 ns	0.243 ns	1.00	0.00	1	-	-	-	-
getItem	Main50	Main50	100	29.40 ns	0.728 ns	0.189 ns	1.44	0.02	2	-	-	-	-
getItem	NuGet50	NuGet50	100	74.83 ns	6.092 ns	0.943 ns	3.65	0.08	3	-	-	-	-

getItemIntLike	After	After	100	60.45 ns	0.460 ns	0.119 ns	1.00	0.00	1	-	-	-	-
getItemIntLike	Main50	Main50	100	220.00 ns	4.159 ns	0.644 ns	3.64	0.02	2	0.0450	-	-	283 B
getItemIntLike	NuGet50	NuGet50	100	250.34 ns	9.453 ns	1.463 ns	4.14	0.03	3	0.0449	-	-	283 B

getItemString	After	After	100	76.03 ns	1.027 ns	0.267 ns	1.00	0.00	2	-	-	-	-
getItemString	Main50	Main50	100	68.22 ns	0.538 ns	0.083 ns	0.90	0.00	1	-	-	-	-
getItemString	NuGet50	NuGet50	100	115.77 ns	0.626 ns	0.162 ns	1.52	0.01	3	-	-	-	-

getItemRefLike	After	After	100	278.41 ns	3.569 ns	0.927 ns	1.00	0.00	3	-	-	-	-
getItemRefLike	Main50	Main50	100	200.05 ns	6.346 ns	1.648 ns	0.72	0.01	1	-	-	-	-
getItemRefLike	NuGet50	NuGet50	100	236.40 ns	3.194 ns	0.494 ns	0.85	0.00	2	-	-	-	-

getItemIntRecord	After	After	100	282.93 ns	12.166 ns	3.160 ns	1.00	0.00	3	-	-	-	-
getItemIntRecord	Main50	Main50	100	200.34 ns	6.643 ns	1.028 ns	0.71	0.01	1	-	-	-	-
getItemIntRecord	NuGet50	NuGet50	100	235.25 ns	1.148 ns	0.178 ns	0.83	0.01	2	-	-	-	-

It should be probably a separate issue. But I'm not sure I would dig deeper if such tradeoff or breaking changes are not acceptable.

And again, static readonly fields would help. It's a total mess now. Not only such fields are not possible, beforefieldinit also feels random, and some weird init fields are appearing. I still could not understand what's going one with the code gen. I would like to embed multi-line IL directly if I may, beyond (#..#) things 🙄

Same as dotnet#10845

cartermp · 2021-01-12T02:21:11Z

@dsyme it would be good to get your eyes on this as well.

Note that the last set of benchmarks are not reflective of this change, they are a part of a discussion that manifests in #10855

KevinRansom

Looks good,

thank you for this

cartermp · 2021-01-17T20:18:13Z

I will merge this in. Thanks @buybackoff

buybackoff · 2021-01-17T20:31:33Z

@cartermp

I will merge this in. Thanks @buybackoff

Thanks!

It would be interesting to know if comparer changes or even the direction have any chance? It looks like there are huge easy gains for 95+% cases, but they are blocked by the remaining <5% mostly edge cases. I've noticed you are going to add S.C.Immutable dependency for immutable arrays, but (hypothetically) replacing MapTree with AVL implementation from there will require exactly the same transition from F#'s comparison constraint to S.C.G.Comparable<T>.Default for efficient inlined comparer calls.

cartermp · 2021-01-21T19:57:28Z

@buybackoff we'll take a look and have a think about them :)

buybackoff changed the title ~~Map: Optimize away ininst check~~ Map: Optimize away isinst check Jan 6, 2021

buybackoff mentioned this pull request Jan 6, 2021

Improved Map performance #10768

Closed

Map: Optimize away ininst check

9f21d51

Store height in leaves. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).

buybackoff force-pushed the map_step2 branch from 45a3dfd to 9f21d51 Compare January 6, 2021 23:58

buybackoff marked this pull request as draft January 7, 2021 00:14

Map: use if instead of match

aa0572c

`Match` produces `sub 1` and `switch` instruction. Here, for any non-trivial count, nodes are more frequent than leaves on the path, so branch prediction should be beneficial.

buybackoff marked this pull request as ready for review January 7, 2021 13:00

vzarytovskii closed this Jan 7, 2021

vzarytovskii reopened this Jan 7, 2021

cartermp approved these changes Jan 7, 2021

View reviewed changes

buybackoff mentioned this pull request Jan 8, 2021

Map: comparer optimization #10855

Closed

Map: Optimize away ininst check: fix trace

bc28258

buybackoff added a commit to buybackoff/fsharp that referenced this pull request Jan 10, 2021

Set: Optimize away isinst check

2ccfed3

Same as dotnet#10845

buybackoff mentioned this pull request Jan 10, 2021

Set: Optimize away isinst check #10860

Merged

KevinRansom approved these changes Jan 15, 2021

View reviewed changes

cartermp merged commit 15654d2 into dotnet:main Jan 17, 2021

cartermp mentioned this pull request Jan 17, 2021

Apply map and set optimizations to tagged internal collections #10889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map: Optimize away `isinst` check #10845

Map: Optimize away `isinst` check #10845

buybackoff commented Jan 6, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021 •

edited

Loading

cartermp commented Jan 7, 2021

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021 •

edited

Loading

cartermp commented Jan 7, 2021

cartermp left a comment

buybackoff commented Jan 7, 2021 •

edited

Loading

cartermp commented Jan 7, 2021

buybackoff commented Jan 8, 2021 •

edited

Loading

cartermp commented Jan 12, 2021

KevinRansom left a comment

cartermp commented Jan 17, 2021

buybackoff commented Jan 17, 2021

cartermp commented Jan 21, 2021

Map: Optimize away isinst check #10845

Map: Optimize away isinst check #10845

Conversation

buybackoff commented Jan 6, 2021 • edited Loading

buybackoff commented Jan 7, 2021 • edited Loading

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021 • edited Loading

cartermp commented Jan 7, 2021

buybackoff commented Jan 7, 2021 • edited Loading

buybackoff commented Jan 7, 2021

buybackoff commented Jan 7, 2021 • edited Loading

cartermp commented Jan 7, 2021

cartermp left a comment

Choose a reason for hiding this comment

buybackoff commented Jan 7, 2021 • edited Loading

cartermp commented Jan 7, 2021

buybackoff commented Jan 8, 2021 • edited Loading

cartermp commented Jan 12, 2021

KevinRansom left a comment

Choose a reason for hiding this comment

cartermp commented Jan 17, 2021

buybackoff commented Jan 17, 2021

cartermp commented Jan 21, 2021

Map: Optimize away `isinst` check #10845

Map: Optimize away `isinst` check #10845

buybackoff commented Jan 6, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 7, 2021 •

edited

Loading

buybackoff commented Jan 8, 2021 •

edited

Loading