-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks #80
Comments
@aklomp are these in any way useful? |
@htot Thanks for your work. It's interesting to see that not all "improvements" to the library have led to actual improvements in real-world benchmarks. Which proves that we need to be careful when introducing new tricks, because some users may be worse off. That said, apart from The I think these benchmarks are cool and might be useful as a jumping-off point for analyzing performance degradations in past commits, but apart from that I don't see a major use for them. The idea of graphing out performance over time is very powerful though, and I'll try to remember it for my toolbox. |
I think the Atom is like Baytrail (and Edison) a x86_64 CPU, they support SSSE3 but not AVX. The core is Silvermont (SLM) which has a penalty for long 64 bit instructions (complicated story), that might be the case here too (I have not test on i686 mode). If so, goldmont / airmont may behave completely different (but I don't have those here). My i7-10700 btw appears to have 16MB L3 cache. So above benchmarks are not really usable (typically nobody ever would encode the same string twice). I patched to add a 100MB string and find that in all cases except "plain" we are near the bandwidth limit of the DDR. And even "plain" with openmp reaches bandwidth limit. All these optimizations are useful in particular on the slow Atoms, but there we had a degradation. I'll add some improvements here, maybe you can label this "not a bug"? |
The Intel Atom N270 really is a 32-bit Diamondville core with I created a "benchmarking" label and added it to this issue. I'll leave it open for the time being, then. |
I see. That's confusing, there are also Diamondville CPUs with 64-bit (Atom 230). |
I did some automated benchmarking on my i7-10700 and Edison (Merrifield dual core Silvermont Atom without cache memory, similar to Baytrail) that I want to share here. Strictly, this issue is for reference only. It might be useful to find those commits causing substantial performance increases or decreases. All data have been taken without OpenMP (1 thread only) and in x86_64 mode. On i7 you will see some deviation probably caused by frequency scaling / turbo boost. Don't let that disturb you.
Data can be found here if you want to play yourself benchmarks.ods
Below I filter out the most interesting commits.
Encoding
Note that on Edison SSE3 encoding took a hit with 9a0d1b2.
Decoding
Especially for Edison it has been a bumpy ride, with great improvements 3f3f31c and regressions 0a69845 on SSE3 but also for PLAIN cfa8bf7 and f538baa.
The text was updated successfully, but these errors were encountered: