-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks #95
base: master
Are you sure you want to change the base?
Benchmarks #95
Conversation
Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
By performing benchmarks on Intel Edison (a Silvermont Atom CPU) in x86_64 mode from v0.3.0 we find that SSE3 had various ups and down. Substantial changes since v0.3.0 were: HASH SSSE3 SSSE3 e12e3cd 165 210 3f3f31c 206 150 67ee3fd 205 205 0a69845 145 205 a5b6739 145 218 6310c1f 157 218 9a0d1b2 158 210 5874921 165 210 Best performance was from 67ee3fd until decode performance regressed from 205 to 145 MB/s with commit 0a69845. The commit before that (b6417f3) had best decode performance with relatively good encode. Core(-i7) processors do not should such large performance changes. This patch adds the ssse3 codec from b6417f3 as ssse3_atom. Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
For
I'm don't know if it has better latency, but it does have fewer instructions and constants ... edit: in comparision to |
Yeah, this draft PR just revives an older version of the codec which showed better performance then currently (on SLM). I didn't try to create my own improvement. PR #46 is a bit older, did you benchmark it at the time on atom? |
@aklomp @mayeut
Again a draft. Please ignore the Benchmarks patch, I was to far to drop that and rebase against HEAD.
The interesting one is codec: add ssse3_atom.
My experience with CRC32C with Silvermont Atom (SLM) processors is that in 64b certain combinations of instructions incur a penalty (see Intel manuals) making the advantage of running in 64b mode negative in some cases. In later Atoms (Goldmont, Airmont) this penalty likely does not occur, but I don't have the hardware to test. Running base64 on SLM shows strange performance regressions while core i7 shows improvement.
So, I revived the best ssse3 codec as ssse3_atom and tested on Intel Edison (dual core 500MHz) in 64b/32b mode (because that is easy to do) and on Intel NUC with Baytrail Atom in 64b (to show the relevancy on main stream CPU).
Improvement by going back to the revived codec in bold, degradation in italic.
We see that on i7 the latest version is indeed the fastest, on SLM 32 bit there is no difference. But on SLM 64b SSSE3_ATOM is 25% faster.
Now, having a fast algorithm has a much more noticable effect on a slow Atom then on a fast i7... So what do you guys think, should we add a specialized SSSE3 for SLM?