-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance of parallelization #5
Comments
I made a quick and dirty implementation that can be found here, and the benchmarks + more detailed write up can be found here On an 8 core cpu, this implementation gets roughly 2-4 times faster, depending on data square size. I think this is a decent starting point, but judging by the trace, it looks like there is plenty of room for improvement. I should have a chance to give the nmt library a similar treatment sometime in the next few days. |
I ran some more benchmarks for the nmt generation portion of hashing the data availability header, and fortunately (unfortunately?) everything went as one might expect. This task is even more parallelizable than generating the erasure data, but this implementation doesn't get the full benefits for such a parallelizable load. Summary: These unoptimized implementations show that there are easy options for performance gains should that be required. Thanks to @adlerjohn for the fun the idea As for further investigation
|
Thanks for this @evan-forbes, this looks like great work. |
@evan-forbes, I didn't see this before. Nice graphs! Curios, how did you generate those? |
A few ideas towards parallelizing(it is getting annoying to wait for tests to generate the compute the square already):
|
Additional thoughts regarding parallelization: whether rsmt2d erasure coding/NMT root computing should be done in parallel largely depends on how parallelized the user of the library is. For example, if only a single block is verified at a time, then indeed there will be gains if rsmt2d is parallelized. But if multiple blocks are verified in parallel, then the CPU will already be saturated. That being said, there's always the case that once you've completed IBD and are at the tip, you'll be verifying one block at a time, and in such cases rsmt2d being parallelized is a requirement to saturate CPU cores. |
just Google sheets iirc |
Users can potentially rely on Go's scheduler to distribute the load during sync, e.g., 10 blocks sync and reconstruction for each runs in its own routine. This is the simplest approach, but it has its downsides:
The best long-term approach I would propose is to have rsmt2d lib also do some form of concurrency and parallelization on the axis level. Writing some small global engine that processes axises and does not even aware of the square notion. It can even be implementing the Codec interface, but I have to think about it more. |
I was going through the code for another reason and realized that it wouldn't be possible to do this on the Codec level, unfortunately. Codec is strictly sync interface, and the only way I see rn to make parallelization of repair by axis is to change |
we can probably change this issue to be about actually implementing parallelization, with the above feedback from @adlerjohn and @Wondertan, instead of only investigating it. |
Computing the erasure coding of each row and column, along with computing the namespace Merkle trees, can be done in parallel. Investigate the potential performance gains of doing so.
The text was updated successfully, but these errors were encountered: