Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sha2 functions #687

Merged
merged 3 commits into from
Jan 14, 2018
Merged

Add Sha2 functions #687

merged 3 commits into from
Jan 14, 2018

Conversation

tiehuis
Copy link
Member

@tiehuis tiehuis commented Jan 13, 2018

We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s
Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s

release-safe is a bit slower here. Haven't delved into the assembly to see why yet.

We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d
```

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

```
Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s
```

```
Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s
```
@tiehuis
Copy link
Member Author

tiehuis commented Jan 13, 2018

Seems like an i386 issue, probably assuming a variable width incorrectly somewhere. Can reproduce the same issue with wine. Getting late here, so will fix this up tomorrow.

@andrewrk
Copy link
Member

This looks like #537.

Feel free to disable the test for 32 bit windows. There are enough instances of this that we should probably remove 32 bit windows support from the readme and remove it from the test matrix, until we can pass all the tests.

@andrewrk andrewrk merged commit e7e7625 into master Jan 14, 2018
@tiehuis tiehuis deleted the sha2 branch January 17, 2018 06:06
@andrewrk
Copy link
Member

@tiehuis
I'm trying to reproduce your findings here. I see that the nayuki page has a test harness for the C implementations. What did you do to come up with the zig numbers?

@andrewrk
Copy link
Member

Ah, I found std/crypto/throughput_test.zig

@andrewrk
Copy link
Member

andrewrk commented Mar 19, 2018

I looked into this a little bit, and I noticed that the throughput test in zig was getting LTO (in the sense that we emit only a single LLVM module/ .o file) while the C benchmark had to make function calls across .o files. So I copied the sha256.c function into the test .c file and made all the functions static. This actually did not change the timings. I found that your zig implementation of sha256 generates 14% faster machine code.

I looked at a comparison of the assembly and it's hard to tell exactly what is different, but it appears that the zig implementation has slightly better instruction selection.

@andrewrk
Copy link
Member

Your zig sha256 implementation is faster than the hand rolled assembly from the nayuki page.

  • nayuki hand rolled x86_64 assembly sha256: 192.6 MB/s
  • zig stdlib sha256 throughput test: 204 MB/s

@tiehuis
Copy link
Member Author

tiehuis commented Mar 19, 2018

The initial tests here I actually exported the zig functions with c bindings so the call overhead was the same.

@andrewrk
Copy link
Member

I think I know what's going on. If you look at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf there's a section called "Optimizations with rorx". I believe that LLVM is able to come up with these optimizations with the zig implementation, but it somehow does not discover them with the clang implementation.

@andrewrk andrewrk mentioned this pull request Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants