-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize dwarfdump per-unit #285
Conversation
Forgot to mention, that 4x speedup is basically linear speedup on my laptop (4 cores, 8 hyperthreads). |
There's an initial speedup due to writing to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice speedup! I think the memory usage is fine. It would be nice if some other library could make parallel_output
simpler, but I'm not aware of anything that does (I saw your comments in rayon about converting to a serial iterator).
examples/dwarfdump.rs
Outdated
@@ -65,6 +71,57 @@ impl From<io::Error> for Error { | |||
|
|||
pub type Result<T> = result::Result<T, Error>; | |||
|
|||
fn parallel_output<II: IntoIterator, F>(max_workers: usize, iter: II, f: F) -> Result<()> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benefit to using Also please move the bound into the where clause.IntoIterator
instead of Iterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use Iterator
here and move into_iter
to the caller, but specifying IntoIterator
here makes this function more reusable with I think no net increase in complexity.
examples/dwarfdump.rs
Outdated
fn parallel_output<II: IntoIterator, F>(max_workers: usize, iter: II, f: F) -> Result<()> | ||
where F: Sync + Fn(II::Item, &mut Vec<u8>) -> Result<()>, | ||
II::IntoIter: Send { | ||
let state = Mutex::new((iter.into_iter().fuse(), 0, Ok(()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the fuse
doesn't hurt, but is it actually needed for the current caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is complex enough that using a local struct type for state
would improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We depend on fused behavior. If we remove fuse
here then I would want to make parallel_output take a Vec<T>
instead of a generic Iterator
/IntoIterator
. Is that preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's fine how it is.
src/reader.rs
Outdated
@@ -179,7 +179,7 @@ impl ReaderOffset for usize { | |||
/// | |||
/// All read operations advance the section offset of the reader | |||
/// unless specified otherwise. | |||
pub trait Reader: Debug + Clone { | |||
pub trait Reader: Debug + Clone + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this (and same for Endianity
). Nothing in gimli requires this. I think this should be up to the consumer to decide, but this isn't something I've had much experience with. We're already defining a Reader
trait in dwarfdump that could specify this requirement for its use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is tricky because there doesn't seem to be a way for dwarfdump::Reader
to constrain its Endian
to be Send + Sync
. This is related to rust-lang/rust#38738.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured something out.
if flags.eh_frame { | ||
dump_eh_frame(w, eh_frame)?; | ||
dump_eh_frame(&mut BufWriter::new(out.lock()), eh_frame)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are a bit ugly, but I don't have a better idea. This relies on the drop to flush, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Well spotted. That's interesting. |
Before: time ( target/release/examples/dwarfdump -i ~/mozilla-central/obj-ff-opt/dist/bin/libxul.so >& /dev/null ) real 1m39.153s user 1m37.714s sys 0m1.320s After: time ( target/release/examples/dwarfdump -i ~/mozilla-central/obj-ff-opt/dist/bin/libxul.so >& /dev/null ) real 0m25.641s user 2m3.328s sys 0m1.087s This increases memory usage. We buffer the output; the max memory usage increases by around the size of the N largest outputs per compilation unit, where N is the min of 16 and num_cpus::get(). The larger compilation units in Firefox libxul.so produce tens to hundreds of megabytes of output each. Then again, the speedup processing such large files is important.
Maybe you won't like the memory usage increase, but this is a pretty nice speedup when you're not writing the results to a file. Fearless concurrency FTW.
I have a followup patch that adds a dwarfdump option to drop the output for compilation units where the output doesn't match a given regex, which makes this more useful.