-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cmultithreading #343
base: master
Are you sure you want to change the base?
Cmultithreading #343
Conversation
We can get rid of uneeded memsets when clearing state, since the count is always reset. We can clear the hasher state on init if we need to start from zeroes. Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
This is a minimal thread-pool implementation for BLAKE3. As part of the commit a very simple file mmap-ing example is introduced that providing file checksuming. At least on my machines (x86-64 linux boxes) the C implementation is within a 1% point in performance of the rust b3sum tool. > $ ls -l ggml-vicuna-7b-1.1-q4_0.bin -rw-rw-r-- 1 panto panto 3791725184 Ιουν 17 13:34 ggml-vicuna-7b-1.1-q4_0.bin > $ time b3sum ggml-vicuna-7b-1.1-q4_0.bin > fe87fecc4427c4661aa46f52a7e286ecd6dd7f5f788564abb2c46d2ed5341584 ggml-vicuna-7b-1.1-q4_0.bin > > real 0m0,190s > user 0m1,583s > sys 0m0,196s > $ time ./example-mmap ggml-vicuna-7b-1.1-q4_0.bin > fe87fecc4427c4661aa46f52a7e286ecd6dd7f5f788564abb2c46d2ed5341584 ggml-vicuna-7b-1.1-q4_0.bin > > real 0m0,204s > user 0m1,525s > sys 0m0,247s At my local copy of BLAKE3 with more extensive changes I can get the C implementation to be slightly faster than the rust one. But those require more extensive changes which will have to come much later. Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
When buffer sizes are small in streaming mode the partial output must have the rest of the buffer filled with zeroes. Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Thank you for doing all this work! It might be a week or two before I have enough time to review it properly, but I'll be very interested to understand how it all works. Is it architecturally similar to Rayon, or something totally different? I should be honest that I don't know if we're going to want to maintain this much threading machinery inside the project. One thing I'll want to think about is whether we could expose APIs that would make it possible to implement something like this on top of our code. |
Don't mention it. I had a need for a fast hash and BLAKE3 fits the bill. I am not familiar with Rayon but from what I can understand from the exposed API the threading method is exactly the same. Well, I wouldn't worry about it too much, since I'm maintaining a full blown C project with all the bells and whistles of detecting compiler versions and blake3 backends already. Let me know if there's something I can do to make things easy for you guys to make next BLAKE3 version easier to drop in my codebase. For example I could parsel out the BLAKE3 support as a standalone library that people could use like normal (i.e. apt-get install libcblake3) but I don't know if that's something you would be fine with. There's the small nit that I'm not much of a windows person so maybe target Linux/MacOS/unices first. |
FYI, I have finally released the blake3 integrated fyaml, you could take a look at the integrated work starting at pantoniou/libfyaml@140e762 |
I think it will be more useful to provide a C API to the rust code so it can be used as a library instead of maintaining duplicate implementations. |
This is notably missing Windows support. Furthermore I find it very intrusive if libraries spin up their own thread pools as there are machines with hundreds of cores out there and PIds are very much a limited resource, i.e. this can provoke system resource exhaustion events for some people. |
@oconnor663 My only question here is - are you going to multithread the С code to the end of the World? Something is better than nothing! You could add OMP dependency and mark it as experimental and and than much more people would test how it actually works or it's a bad idea. But there nothing even similar I see in the project. |
This PR provides multithreading for the C-implementation.
It is quite minimal, but it gets the job done of getting C to withing 1% of the Rust implementation.