-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aviad/blake2s #576
base: V2
Are you sure you want to change the base?
Aviad/blake2s #576
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job with the PR. I left a bunch of style related comments. In theory they can be ignored as we are merging this to V2
*/ | ||
|
||
#pragma once | ||
typedef unsigned char BYTE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move these inside blake2s namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
icicle/src/merkle-tree/merkle.cu
Outdated
THROW_ICICLE_ERR( | ||
IcicleError_t::InvalidArgument, | ||
"Hash max preimage length does not match merkle tree arity multiplied by digest elements"); | ||
// if (compression.preimage_max_length < tree_config.arity * tree_config.digest_elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just delete those at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
icicle/src/hash/blake2s/blake2s.cu
Outdated
__device__ __forceinline__ void cuda_blake2s_init_state(cuda_blake2s_ctx_t* ctx) | ||
{ | ||
memcpy(ctx->state, ctx->chain, BLAKE2S_CHAIN_LENGTH); | ||
// ctx->state[8] = ctx->t0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's from an old version. deleted
icicle/src/hash/blake2s/blake2s.cu
Outdated
return a; | ||
} | ||
|
||
__device__ uint32_t cuda_blake2s_ROTR32(uint32_t a, uint8_t b) { return (a >> b) | (a << (32 - b)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth to add __inline__
icicle/src/hash/blake2s/blake2s.cu
Outdated
cudaMalloc(&cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch); | ||
assert(keylen <= 32); | ||
|
||
// CUDA_BLAKE2S_CTX ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
WORD block = (n_batch + thread - 1) / thread; | ||
kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE); | ||
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost); | ||
cudaDeviceSynchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation does not support async. All of our other primitives do. So maybe worth adding HashConfig as an input and changing all the functions to their async alternatives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mcm_cuda_blake2s_hash_batch is not a part of the API, it's only for unit tests.
run_hash_many_kernel() is used in the same way as our other implementations
icicle/src/hash/blake2s/blake2s.cu
Outdated
kernel_blake2s_hash<<<block, thread>>>(cuda_indata, inlen, cuda_outdata, n_batch, BLAKE2S_BLOCK_SIZE); | ||
cudaMemcpy(out, cuda_outdata, BLAKE2S_BLOCK_SIZE * n_batch, cudaMemcpyDeviceToHost); | ||
cudaDeviceSynchronize(); | ||
cudaError_t error = cudaGetLastError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use our error-management functions (you can find an example in any of our primitives)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
extern "C" cudaError_t blake2s_cuda( | ||
BYTE* input, BYTE* output, WORD number_of_blocks, WORD input_block_size, WORD output_block_size, HashConfig& config) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the parameter order in blake2s_cuda
is a bit different from keccak256_cuda
(and other keccak functions).
keccak256_cuda(uint8_t* input, int input_block_size, int number_of_blocks, uint8_t* output, HashConfig& config)
To keep things consistent, so my suggestion smt like this:
blake2s_cuda(BYTE* input, WORD input_block_size, WORD number_of_blocks, WORD output_block_size, BYTE* output, HashConfig& config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. I'm not sure if it is worth changing it since this is usually not exposed to the user and it will require updates to the rust signatures as well
## Changes This PR updates the existing Blake2s implementation by integrating a C++ API and a Rust wrapper API.
This reverts commit 166c615.
Describe the changes
adds Blake2s cuda capability.