Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

4-8x sha256 performance deficit compared to sha256sum/openssl (SHA instructions) #361

Closed
Atemu opened this issue Mar 8, 2022 · 2 comments

Comments

@Atemu
Copy link

Atemu commented Mar 8, 2022

I made a minimal sha256sum rewrite out of cryptonite:

import System.Environment
import Crypto.Hash
import qualified Data.ByteString.Lazy as L

main = do
  args <- getArgs
  let file = head args

  content <- L.readFile file

  let digest = hashlazy content :: Digest SHA256

  putStrLn $ show digest ++ " " ++ file

However, it is 4-8 times slower than the regular sha256sum or openssl sha256 on machines with SHA accelerator. (Except on an M1 Pro where my coreutils' sha256sum also doesn't make use of SHA extensions but openssl does.)

Celeron J4105:

$ head -c 10G /dev/zero | pv | /tmp/sha256sum /dev/stdin
10.0GiB 0:01:42 [99.7MiB/s] [                  <=>                                                                                                                                                                   \
                                                                                                                                 ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:23 [ 443MiB/s] [                                                                                                                                                    <=>                                 \
                                                                                                                                 ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:31 [ 324MiB/s] [                                                                                                                                                                                        \
              <=>                                                                                                                ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

M1 Pro:

$ head -c 10G /dev/zero | pv | /tmp/sha256sum /dev/stdin
10.0GiB 0:00:50 [ 202MiB/s] [                                                                                                                                                                                 <=>    ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:37 [ 271MiB/s] [                                                                                                                                         <=>                                            ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:05 [1.74GiB/s] [                     <=>                                                                                                                                                                ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

5800x:

$ head -c 10G /dev/zero | pv | /tmp/sh265sum /dev/stdin
10.0GiB 0:00:33 [ 301MiB/s] [                                                                                             <=>                                             ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d /dev/stdin

$ head -c 10G /dev/zero | pv | sha256sum /dev/stdin
10.0GiB 0:00:06 [1.65GiB/s] [                   <=>                                                                                                                       ]
732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d  /dev/stdin

$ head -c 10G /dev/zero | pv | openssl sha256 /dev/stdin
10.0GiB 0:00:09 [1.05GiB/s] [                           <=>                                                                                                               ]
SHA256(/dev/stdin)= 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d

(This is a minimal reproducer for a bug in git-annex: https://git-annex.branchable.com/bugs/git-annex_is_slow_at_reading_file_content/)

@vincenthz
Copy link
Member

there's no plan to tap in SHA instructions at the moment nor SIMD, so that's an expected slowdown. someone will have to add the instructions support and the layers of compat and fallback for this to happen.

this is a lot of work all in all, I've did this in rust here: https://github.com/typed-io/cryptoxide/tree/master/src/hashing/sha2/impl256

@Atemu
Copy link
Author

Atemu commented May 2, 2022

Thanks for the answer.

Couldn't the Rust implementation be used instead of the C one here?

Could cryptonite hook into existing implementations of all that complexity like openssl?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants