-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map: Optimize away isinst
check
#10845
Conversation
ininst
checkisinst
check
Store height in leaves. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).
Initially (in the benchmarks table) in the But that fails in a bad way on Windows/full framework. The updated benchmarks are below.
There is still an improvement, but smaller. Making this a draft. Is there a way to suppress the error for BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
After : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
MaxRelativeError=0.01 Arguments=/p:Optimize=true IterationCount=20
IterationTime=250.0000 ms WarmupCount=1
|
Results for BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
After : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
MaxRelativeError=0.01 Arguments=/p:Optimize=true IterationCount=20
IterationTime=250.0000 ms WarmupCount=1
|
Oh, that was not due to the cast. Or the effect of this is similar to removing cast. But will not test that if the unsafe cast is unverifiable and there is no workaround.
|
`Match` produces `sub 1` and `switch` instruction. Here, for any non-trivial count, nodes are more frequent than leaves on the path, so branch prediction should be beneficial.
Updated benchmarks after changing The numbers for BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
After : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
MaxRelativeError=0.01 Arguments=/p:Optimize=true IterationCount=20
IterationTime=250.0000 ms WarmupCount=1
|
This looks pretty nice! get/add/contains are also the most relevant operations for the F# compiler, and I suspect that's the case for most consumers as well. The one cause for concern is the error in the add operation. Much higher now and basically makes the improvement a wash. Any thoughts as to why that might be? I expect that the same technique could be applied to Sets as well? |
This may be explained by short time of running the bench, while my machine was not idle. Or by GC. Need to rerun longer. But the number for add was stable between 3-4 runs. |
For the count 100, the improvement in add is 7+ sigma, isn't it? |
4x longer run (2x more iterations each 2x longer) BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
After : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
Before : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
MaxRelativeError=0.01 Arguments=/p:Optimize=true IterationCount=40
WarmupCount=1
|
Thanks! I think that looks great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change, thanks, and thanks for all of the testing.
Thanks! I think there is still some "free lunch" with comparer devirtualization for at least primitive types. It surprises me that static readonly fields or As for Set and complier Map/Set friends, an issue with |
No pressure :) - we can also apply your work to the other implementations if this gets merged soon |
I've seen #9348, #513, and related. Such a hopeless state to optimize the comparison :( Less than an hour tweaking the comparison gives result like below. But it's probably not 100% compatible with the current behavior. It could be if we apply the logic from #9348 snippet, to exclude records & DUs, if that code is correct. If we take the current NuGet 5.0 release without any changes to Map as a baseline, are we ready to (
? I would say that everything could be wrapped by The types in the bench are:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
After : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
Main50 : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
NuGet50 : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
MaxRelativeError=0.01 Arguments=/p:Optimize=true IterationCount=5
IterationTime=250.0000 ms WarmupCount=1
It should be probably a separate issue. But I'm not sure I would dig deeper if such tradeoff or breaking changes are not acceptable. And again, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good,
thank you for this
I will merge this in. Thanks @buybackoff |
Thanks! It would be interesting to know if comparer changes or even the direction have any chance? It looks like there are huge easy gains for 95+% cases, but they are blocked by the remaining <5% mostly edge cases. I've noticed you are going to add |
@buybackoff we'll take a look and have a think about them :) |
Following a discussion here #10768
Found a simple perf gain at the cost of 2 bytes per a Map item. A significant gain in
getItem/contains/add
due to replacement ofisinst
byint32
equality check. Benchmarked against current main.The change is to store the height in leaves and share the field with nodes. Use it as an implicit tag (a leaf when
height = 1
) instead of the type check. Compared to the old discussion, when Left/Right were proposed to be stored in a universal node, this adds 4 bytes to leaves or 2 bytes per item on average (vs 16/8).The tradeoff is basically the only thing to consider. Code changes are trivial.