-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache file names in fill_todo #144
base: master
Are you sure you want to change the base?
Conversation
…ihanglo Clean package perf improvements ### What does this PR try to resolve? I've noticed that `cargo clean -p` execution time scales poorly with size of target directory; in my case (~250GB target directory on M1 Mac) running `cargo clean -p` takes circa 35 seconds. Notably, it's the file listing that takes that time, not deleting the package itself. That is, when running `cargo clean -p SOME_PACKAGE` twice in a row, both executions take roughly the same time. I've tracked it down to the fact that we seem quite happy to use `glob::glob` function, which iterates over contents of target dir. It also was a bit sub-optimal when it came to doing that, for which I've already filled a PR in rust-lang/glob#144 - that PR alone takes down cleaning time down to ~14 seconds. While it is a good improvement for a relatively straightforward change, this PR tries to take it even further. With glob PR applied + changes from this PR, my test case goes down to ~6 seconds. I'm pretty sure that we could squeeze this further, but I'd rather do so in a follow-up PR. Notably, this PR doesn't help with *just* super-large target directories. `cargo clean -p serde` on cargo repo (with ~7Gb target directory size) went down from ~380ms to ~100ms for me. Not too shabby. ### How should we test and review this PR? I've mostly tested it manually, running `cargo clean` against multiple different repos. ### Additional information TODO: - [x] [c770700](c770700) is not quite correct; we need to consider that it changes how progress reporting works; as is, we're gonna report all progress relatively quickly and stall at the end (when we're actually iterating over directories, globbing, removing files, that kind of jazz). I'll address that.
Huh, the CI failure seems a bit spurious to me.. |
Uh-oh, this repo still runs CI on Rust 1.23.0, but one of the dev-dependencies has upgraded to a version that uses a too new Rust feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we're even sorting. I can't find any mention in the API that guarantees any ordering of the returned paths.
So for the performance issue it'd be even more efficient to not sort at all.
I guess it's inherited from the libc's glob, but at least that has GLOB_NOSORT
to opt-out... oh well.
src/lib.rs
Outdated
@@ -330,10 +331,11 @@ impl fmt::Display for GlobError { | |||
struct PathWrapper { | |||
path: PathBuf, | |||
is_directory: bool, | |||
file_name: Option<OsString>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to me that an extra allocation for each path speeds things up. This is just to avoid the work of finding the filename in a path, right?
Have you tried storing the indexes into path
and reconstituting an &OsStr by slicing into path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to avoid the work of finding the filename in a path, right?
Yes, with a caveat that on the current main version we end up performing a filename search multiple times per each path, as it's done in a sort comparator. That's where the slowdown source is - an extra allocation is prolly slower than extracting a filename once, but if we extract it 100's of times it might end up being worth it.
No, I have not tried to store the subrange of a filename. I'll give it a go, thank you for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the necessary methods needed to do the slicing don't seem to be MSRV-compatible, a pity.
A possible workaround for that is to do the slicing on &str
and fall back to pulling the file_name
out of the pathbuf on every access when it's not utf8-compatible (which should be rare).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd require either running UTF-8 validation on PathWrapper::path
whenever we try to use a filename slice (or introducing unsafe
into the lib to avoid the repeated validation); I wonder if the extra complexity is worth it in the first place.
FWIW, b7dbb48 increases peak memory usage of a rust-lang/rust
benchmark by about 23%. I've managed to reduce this to 6% by replacing OsString
with Box<OsStr>
in 778365d; I agree that allocating here is not ideal, but I'm unsure about our options here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get away without unsafe. We can go Path -> Osstr -> Option<&str> -> Option<&[u8]> and then compare the [u8]s most of the time and only in non-utf8 cases use the old code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd have to do that each time we need to inspect the filename, right? We can't store a reference (Option<&[u8]>
one) to PathWrapper::path
within PathWrapper itself.
I believe that OsStr
-> Option<&str>
might entail a UTF-8 validation. It'd be fine to do it once, but if above is correct, we'd have to do it on each filename access.
I might also be missing something and thus be totally off the base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. I guess we can store a pointer instead of a reference. That'd require a bit of unsafe after all, but should be ok since Pathbuf contents are heap-allocated and don't move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I'm not sold on this change; I did get it to work with transmutes, but the difference in runtime for rust-lang benchmark is about 3% (40.0ms vs 41.5ms). I'm honestly not sure if that's worth additional unsafes and all that.
Could we follow suit and add a |
Yeah that'd make sense I think but should be done in a separate PR since the fields of What you can try for this PR is to check if unstable sort would provide another speedup. Since filenames in a directory should be unique (barring pathological filesystems) we don't need a stable sort. |
You should also rebase your PR to get the CI fixes. |
The next breaking change should probably also mark this |
Background: While working with cargo, I've noticed that it takes ~30s to cargo clean -p with large enough target directory (~200GB). With a profiler, it turned out that most of the time was spent retrieving paths for removal in https://github.com/rust-lang/cargo/blob/eee4ea2f5a5fa1ae184a44675315548ec932a15c/src/cargo/ops/cargo_clean.rs#L319 (and not actually removing the files). Change description: In call to .sort_by, we repetitively parse the paths to obtain file names for comparison. This commit caches file names in PathWrapper object, akin to rust-lang#135 that did so for dir info. For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this directory, to account for changes made since 0.3.1). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.
76cafd1
to
b7dbb48
Compare
|
Background:
While working with cargo, I've noticed that it takes ~30s to
cargo clean -p
with large enough target directory (~200GB). Under a profiler, it turned out that most of the time was spent retrieving paths for removal viaglob::glob
in rm_rf_glob (and not actually removing the files).Change description:
In call to
.sort_by
, we repetitively parse the paths to obtain file names for comparison. This commit caches file names inPathWrapper
object, akin to #135 that did so for dir info.For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this repository, to account for changes made since
glob 0.3.1
). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.