Add Sha2 functions #687

tiehuis · 2018-01-13T09:46:44Z

We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s

Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s

release-safe is a bit slower here. Haven't delved into the assembly to see why yet.

We take the fastest time measurement taken across multiple runs. Tested across multiple compiler flags and the best chosen. ``` Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Gcc: 7.2.1 20171224 Clang: 5.0.1 Zig: 0.1.1.304f6f1d ``` See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly. ``` Gcc -O2 219 Mb/s Clang -O2 213 Mb/s Zig --release-fast 284 Mb/s Zig --release-safe 211 Mb/s Zig 6 Mb/s ``` ``` Gcc -O2 350 Mb/s Clang -O2 354 Mb/s Zig --release-fast 426 Mb/s Zig --release-safe 300 Mb/s Zig 11 Mb/s ```

tiehuis · 2018-01-13T10:20:42Z

Seems like an i386 issue, probably assuming a variable width incorrectly somewhere. Can reproduce the same issue with wine. Getting late here, so will fix this up tomorrow.

andrewrk · 2018-01-13T15:51:21Z

This looks like #537.

Feel free to disable the test for 32 bit windows. There are enough instances of this that we should probably remove 32 bit windows support from the readme and remove it from the test matrix, until we can pass all the tests.

andrewrk · 2018-03-19T18:30:26Z

@tiehuis
I'm trying to reproduce your findings here. I see that the nayuki page has a test harness for the C implementations. What did you do to come up with the zig numbers?

andrewrk · 2018-03-19T18:32:59Z

Ah, I found std/crypto/throughput_test.zig

andrewrk · 2018-03-19T18:52:15Z

I looked into this a little bit, and I noticed that the throughput test in zig was getting LTO (in the sense that we emit only a single LLVM module/ .o file) while the C benchmark had to make function calls across .o files. So I copied the sha256.c function into the test .c file and made all the functions static. This actually did not change the timings. I found that your zig implementation of sha256 generates 14% faster machine code.

I looked at a comparison of the assembly and it's hard to tell exactly what is different, but it appears that the zig implementation has slightly better instruction selection.

andrewrk · 2018-03-19T19:43:05Z

Your zig sha256 implementation is faster than the hand rolled assembly from the nayuki page.

nayuki hand rolled x86_64 assembly sha256: 192.6 MB/s
zig stdlib sha256 throughput test: 204 MB/s

tiehuis · 2018-03-19T19:43:45Z

The initial tests here I actually exported the zig functions with c bindings so the call overhead was the same.

andrewrk · 2018-03-20T18:38:59Z

I think I know what's going on. If you look at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf there's a section called "Optimizations with rorx". I believe that LLVM is able to come up with these optimizations with the zig implementation, but it somehow does not discover them with the clang implementation.

tiehuis added 2 commits January 13, 2018 22:37

Change indexing variable types for crypto functions

1f3ed5c

Disable win32 tests for Sha2 + correct lengths

9be9f1a

andrewrk merged commit e7e7625 into master Jan 14, 2018

tiehuis deleted the sha2 branch January 17, 2018 06:06

andrewrk mentioned this pull request Apr 29, 2019

Ascii #2171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sha2 functions #687

Add Sha2 functions #687

tiehuis commented Jan 13, 2018

tiehuis commented Jan 13, 2018

andrewrk commented Jan 13, 2018

andrewrk commented Mar 19, 2018

andrewrk commented Mar 19, 2018

andrewrk commented Mar 19, 2018 •

edited

Loading

andrewrk commented Mar 19, 2018

tiehuis commented Mar 19, 2018

andrewrk commented Mar 20, 2018

Add Sha2 functions #687

Add Sha2 functions #687

Conversation

tiehuis commented Jan 13, 2018

tiehuis commented Jan 13, 2018

andrewrk commented Jan 13, 2018

andrewrk commented Mar 19, 2018

andrewrk commented Mar 19, 2018

andrewrk commented Mar 19, 2018 • edited Loading

andrewrk commented Mar 19, 2018

tiehuis commented Mar 19, 2018

andrewrk commented Mar 20, 2018

andrewrk commented Mar 19, 2018 •

edited

Loading