-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sha2 functions #687
Add Sha2 functions #687
Conversation
We take the fastest time measurement taken across multiple runs. Tested across multiple compiler flags and the best chosen. ``` Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Gcc: 7.2.1 20171224 Clang: 5.0.1 Zig: 0.1.1.304f6f1d ``` See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly. ``` Gcc -O2 219 Mb/s Clang -O2 213 Mb/s Zig --release-fast 284 Mb/s Zig --release-safe 211 Mb/s Zig 6 Mb/s ``` ``` Gcc -O2 350 Mb/s Clang -O2 354 Mb/s Zig --release-fast 426 Mb/s Zig --release-safe 300 Mb/s Zig 11 Mb/s ```
Seems like an i386 issue, probably assuming a variable width incorrectly somewhere. Can reproduce the same issue with wine. Getting late here, so will fix this up tomorrow. |
This looks like #537. Feel free to disable the test for 32 bit windows. There are enough instances of this that we should probably remove 32 bit windows support from the readme and remove it from the test matrix, until we can pass all the tests. |
@tiehuis |
Ah, I found std/crypto/throughput_test.zig |
I looked into this a little bit, and I noticed that the throughput test in zig was getting LTO (in the sense that we emit only a single LLVM module/ .o file) while the C benchmark had to make function calls across .o files. So I copied the sha256.c function into the test .c file and made all the functions static. This actually did not change the timings. I found that your zig implementation of sha256 generates 14% faster machine code. I looked at a comparison of the assembly and it's hard to tell exactly what is different, but it appears that the zig implementation has slightly better instruction selection. |
Your zig sha256 implementation is faster than the hand rolled assembly from the nayuki page.
|
The initial tests here I actually exported the zig functions with c bindings so the call overhead was the same. |
I think I know what's going on. If you look at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf there's a section called "Optimizations with rorx". I believe that LLVM is able to come up with these optimizations with the zig implementation, but it somehow does not discover them with the clang implementation. |
We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.
See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.
release-safe
is a bit slower here. Haven't delved into the assembly to see why yet.