Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache file names in fill_todo #144

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

osiewicz
Copy link

@osiewicz osiewicz commented Apr 27, 2024

Background:
While working with cargo, I've noticed that it takes ~30s to cargo clean -p with large enough target directory (~200GB). Under a profiler, it turned out that most of the time was spent retrieving paths for removal via glob::glob in rm_rf_glob (and not actually removing the files).
image

Change description:
In call to .sort_by, we repetitively parse the paths to obtain file names for comparison. This commit caches file names in PathWrapper object, akin to #135 that did so for dir info.

For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this repository, to account for changes made since glob 0.3.1). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.

bors added a commit to rust-lang/cargo that referenced this pull request May 2, 2024
…ihanglo

Clean package perf improvements

### What does this PR try to resolve?

I've noticed that `cargo clean -p` execution time scales poorly with size of target directory; in my case (~250GB target directory on M1 Mac) running `cargo clean -p` takes circa 35 seconds. Notably, it's the file listing that takes that time, not deleting the package itself. That is, when running `cargo clean -p SOME_PACKAGE` twice in a row, both executions take roughly the same time.

I've tracked it down to the fact that we seem quite happy to use `glob::glob` function, which iterates over contents of target dir. It also was a bit sub-optimal when it came to doing that, for which I've already filled a PR in rust-lang/glob#144 - that PR alone takes down cleaning time down to ~14 seconds. While it is a good improvement for a relatively straightforward change, this PR tries to take it even further. With glob PR applied + changes from this PR, my test case goes down to ~6 seconds. I'm pretty sure that we could squeeze this further, but I'd rather do so in a follow-up PR.

Notably, this PR doesn't help with *just* super-large target directories. `cargo clean -p serde` on cargo repo (with ~7Gb target directory size) went down from ~380ms to ~100ms for me. Not too shabby.

### How should we test and review this PR?

I've mostly tested it manually, running `cargo clean` against multiple different repos.

### Additional information
TODO:
- [x] [c770700](c770700) is not quite correct; we need to consider that it changes how progress reporting works; as is, we're gonna report all progress relatively quickly and stall at the end (when we're actually iterating over directories, globbing, removing files, that kind of jazz). I'll address that.
@Kobzol
Copy link
Contributor

Kobzol commented Dec 23, 2024

I benchmarked this locally and this improves the speed (on top of #135) on the rust-lang/rust repository by 20%, which is quite nice.

@the8472 Could you please take a look? The patch looks good to me.

src/lib.rs Outdated Show resolved Hide resolved
@osiewicz
Copy link
Author

Huh, the CI failure seems a bit spurious to me..

@Kobzol
Copy link
Contributor

Kobzol commented Dec 23, 2024

Uh-oh, this repo still runs CI on Rust 1.23.0, but one of the dev-dependencies has upgraded to a version that uses a too new Rust feature.

@the8472 the8472 self-assigned this Dec 23, 2024
src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we're even sorting. I can't find any mention in the API that guarantees any ordering of the returned paths.
So for the performance issue it'd be even more efficient to not sort at all.

I guess it's inherited from the libc's glob, but at least that has GLOB_NOSORT to opt-out... oh well.

src/lib.rs Outdated
@@ -330,10 +331,11 @@ impl fmt::Display for GlobError {
struct PathWrapper {
path: PathBuf,
is_directory: bool,
file_name: Option<OsString>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me that an extra allocation for each path speeds things up. This is just to avoid the work of finding the filename in a path, right?

Have you tried storing the indexes into path and reconstituting an &OsStr by slicing into path?

Copy link
Author

@osiewicz osiewicz Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to avoid the work of finding the filename in a path, right?

Yes, with a caveat that on the current main version we end up performing a filename search multiple times per each path, as it's done in a sort comparator. That's where the slowdown source is - an extra allocation is prolly slower than extracting a filename once, but if we extract it 100's of times it might end up being worth it.

No, I have not tried to store the subrange of a filename. I'll give it a go, thank you for the suggestion!

Copy link
Member

@the8472 the8472 Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the necessary methods needed to do the slicing don't seem to be MSRV-compatible, a pity.

A possible workaround for that is to do the slicing on &str and fall back to pulling the file_name out of the pathbuf on every access when it's not utf8-compatible (which should be rare).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd require either running UTF-8 validation on PathWrapper::path whenever we try to use a filename slice (or introducing unsafe into the lib to avoid the repeated validation); I wonder if the extra complexity is worth it in the first place.
FWIW, b7dbb48 increases peak memory usage of a rust-lang/rust benchmark by about 23%. I've managed to reduce this to 6% by replacing OsString with Box<OsStr> in 778365d; I agree that allocating here is not ideal, but I'm unsure about our options here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get away without unsafe. We can go Path -> Osstr -> Option<&str> -> Option<&[u8]> and then compare the [u8]s most of the time and only in non-utf8 cases use the old code path.

Copy link
Author

@osiewicz osiewicz Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to do that each time we need to inspect the filename, right? We can't store a reference (Option<&[u8]> one) to PathWrapper::path within PathWrapper itself.
I believe that OsStr -> Option<&str> might entail a UTF-8 validation. It'd be fine to do it once, but if above is correct, we'd have to do it on each filename access.

I might also be missing something and thus be totally off the base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I guess we can store a pointer instead of a reference. That'd require a bit of unsafe after all, but should be ok since Pathbuf contents are heap-allocated and don't move.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I'm not sold on this change; I did get it to work with transmutes, but the difference in runtime for rust-lang benchmark is about 3% (40.0ms vs 41.5ms). I'm honestly not sure if that's worth additional unsafes and all that.

@osiewicz
Copy link
Author

Could we follow suit and add a sort_paths field to MatchOptions that'd default to true?

@the8472
Copy link
Member

the8472 commented Dec 30, 2024

Yeah that'd make sense I think but should be done in a separate PR since the fields of MatchOptions are all pub so it'd be a breaking change and needs a new version.

What you can try for this PR is to check if unstable sort would provide another speedup. Since filenames in a directory should be unique (barring pathological filesystems) we don't need a stable sort.

@the8472
Copy link
Member

the8472 commented Dec 30, 2024

You should also rebase your PR to get the CI fixes.

@tgross35
Copy link
Contributor

Yeah that'd make sense I think but should be done in a separate PR since the fields of MatchOptions are all pub so it'd be a breaking change and needs a new version.

The next breaking change should probably also mark this #[non_exhaustive]. Which requires bumping the MSRV to at least 1.40, but I think that's unobjectionable anyway.

Background:
While working with cargo, I've noticed that it takes ~30s to cargo clean -p with large enough target directory (~200GB).
With a profiler, it turned out that most of the time was spent retrieving paths for removal in https://github.com/rust-lang/cargo/blob/eee4ea2f5a5fa1ae184a44675315548ec932a15c/src/cargo/ops/cargo_clean.rs#L319 (and not actually removing the files).

Change description:
In call to .sort_by, we repetitively parse the paths to obtain file names for comparison. This commit caches file names in PathWrapper object, akin to rust-lang#135 that did so for dir info.

For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this directory, to account for changes made since 0.3.1). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.
@osiewicz
Copy link
Author

sort_unstable_by doesn't seem to be substantially faster for rust-lang/rust repro used for #135.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants