-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking relative to your entry #4
Comments
@noahfalk thanks for getting in touch and the mention is fine. However, I cannot reproduce the rather poor performance you see for my implementation or rather if I run yours on 1B default then I get same perf as my own. I do see better numbers for the 10k real station names though, very nice! So not sure what is going on here if I compare to the readme numbers this does not match what I see on my machine after a quick run. So more testing is perhaps advisable. Perhaps ask to be included at https://github.com/buybackoff/1brc in https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/#results As far as I can tell from a quick look your approach seems similar to mine but with quad unrolling and unrolling is also something I have looked at. PS: I haven't looked at actual results. |
I see you already made a request for buybackoff so I will wait and see the results for more :) |
Yeah I was a little surprised. When I ran our entries together on my Windows dev machine there was clearly a difference, but not as prominent as it wound up being on the CCX33 machine that I used for my pseudo-official results. Its always possible that I messed up something in the benchmarking but @buybackoff's results also seem to confirm a difference. My best guess is that we've got machine dependent factors at play and the results depend substantially on which hardware is making the measurement. If you are interested in posting any results from your own hardware I'm happy to link to create a more comprehensive picture of how different hardware is affecting results. Thanks! |
Incredible! I do think it's likely @noahfalk solution works a lot better on fewer cores vs mem bw, that's at least a theory, given my pc has 16c/32t and ddr4 dual channel the ratio of cores/mem bw is high. I'll try to rerun and post some numbers tomorrow. |
Limiting max clock freq also will favor simd heavy stuff of course too, that's also why I think numbers differ from my machine where clock is not reduced. Unrolling and going wide simd makes sense in that case. Perhaps @noahfalk you can share full machine details of your local run. In any case I'd definitely think this beats anything else on the hetzner with high mem bw, no smt, and only 8 cores at lower clocks. Really impressive work. |
If it helps here is info from the CCX33 machine where I ran my numbers:
Let me know if there is any other info you want me to grab from that machine that would help out. |
@noahfalk sorry I thought you said you had a "Windows dev machine" where you saw a big difference. Anyway, on "default" 1B on AMD 5950X 16c32t, dual channel DDR4, Windows 10 the difference is minuscule (5%). Latter is mine. On the 10k set there is a large difference but still only ~15%. Hence, it is very machine dependent 😊 |
Oh yes I do, I just didn't realize that is the machine you were talking about, my bad. I'll grab the numbers from that machine and a little machine info later tonight. |
So for my windows dev machine it is a Core i7-9700K @ 3.6GHz, 8 cores, 8 hardware threads. I've measured it at ~14GB/s single threaded RAM bandwidth and ~19GB/s multi-threaded. This is not a terribly quiet machine and I think it may have thermal issues so I do not trust it as a high quality benchmarking environment. Running our respective entries this is the performance I see at the moment:
|
I think all of us need to share our datasets 🤣 Both Personally, I picked my hash table constants by running At the end of the contest I'll also help run more people's solutions, so all of us can have more data points to compare. |
Hey @nietras, I just wanted to let you know that I benchmarked my 1brc attempt against your implementation and mentioned you in my README. You've got the fastest established C# implementation I was aware of so it seemed like an important baseline to have. If you have any concerns or questions about what I wrote there (or if you'd like me to remove any of it), just let me know. Thanks!
https://github.com/noahfalk/1brc/tree/main
The text was updated successfully, but these errors were encountered: