Parallelizing the BLAKE3 crypto hash function via its merkle tree structure.
Check Presentation for a complete explanation.
BLAKE3 is a gg crypto hash function. It has good scope for parallelism.
We try to extract as much of that parallelism as possible by using GPUs.
We also try to speed it up on the CPU with Open-MP and AVX2.
All of this is possible due to our new algorithm - Blaze3.
- Rewrite the basic, reference implemenation in C++
- Rewrite it again, in CUDA C++
- Make sure all the tests pass (Continuous process)
- Optimize it, fix memory bandwidth issues if they exist (Continuous process)
- The
basic
directory has the reference implementations. - A full copy of the original reference implementation is in
testing
. - The blake3 paper is also here for reference.
- Openmp work in
openmp
. This version is maxed out for efficency. - Cuda work in
cuda
. This version uses dynamic parallelism. - Dark cuda work happens in
dark
.