-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: add parallelism to hashing #411
Conversation
tokio::task::spawn_blocking(|| { | ||
chunk.and_then(|chunk| TreeNode::Leaf(chunk.freeze()).encode()) | ||
}).err_into::<anyhow::Error>() | ||
}).buffered(hash_par).map(|x| x.and_then(|x| x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is a nicer way to do this than .map(|x| x.and_then(|x| x))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you use flatten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not qualified to review async code, but in smoke testing I did an
|
Thanks for the test. Trying things out is often more useful than staring at the code. I will check this, and also try to figure out how to test this. |
Can you retry once #408 is merged? |
c64f962
to
adb9844
Compare
hrm, still seeing the issue on get when running this branch. hash results are the same compared to main, but this might be an issue with the content not actually being in the blocks? |
Can you tell me how to reproduce this? I tried with a 300m random file, and am not seeing this. |
adb9844
to
24a2c64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this a bunch locally, and am good with it. I'd like @dignifiedquire to sign off
The most expensive thing when adding a file is hashing the content.
This adds parallelism to this hashing by splitting the tree building into two stages. One where the leafs are encoded, and one where the actual tree is being built.
This makes sure that iroh-cli now uses all available cores, and therefore increases perf quite a bit. But we are now limited by the DB.