-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote dealloc refactor. #138
Conversation
Nice! We’ll bench this tomorrow:) thanks! |
Change remote to count down 0, so fast path does not need a constant. Use signed value so that branch does not depend on addition.
On x64 Linux this is now, 15 instructions to the branch to decide it is a remote dealloc, and then a further 13 instructions (including one branch, not taken in common case) to complete the remote deallocation. If it actually posts it is a lot more instructions, but that is very infrequent. Previously, it was a tangled mess of assembly, and probably twice as many instructions. |
I know this is merged already but I wanted to add some results from our side: old
new
|
Interesting, so the perf numbers show it is spending much less time in remote dealloc, 2.68% -> 0.92%, but this hasn't translated into a throughput win for you? How much noise is there in the "Throughput" number? |
Sorry I did a copy & paste oopsie, when pasting the numbers the throughput for old and new were swapped |
the new code is ~2% faster |
Phew, you had me questioning my ability to guess asm speed (which obviously is pretty questionable ;-) ). 2% that is pretty good. More than I expected. |
Ja 2% is really nice! I think snmalloc is a really really good fit for tremor :D |
The profile is good, but the Latency axis should read nanos. We use HDR Histogram https://hdrhistogram.github.io/HdrHistogram/plotFiles.html to plot. |
This performs a small refactor on the remote dealloc path to split the common case out.