-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use git gc --auto
in maybe_gc_repo
#8196
Conversation
r? @Eh2406 (rust_highfive has picked a reviewer for you, use r? to override) |
Thanks! It looks like it broke some tests, though. Also, do you have any information about how safe this is to do? Does git work well concurrently with libgit2? The |
git gc --auto will not remove anything from the repo that is not older than 2 weeks by default (although someone could have set gc.pruneExpire to now). It will also only remove unreferenced loose objects. Those don't happen simply by fetching. They happen when you |
I don't know what the remaining rustc_info_cache errors are, but they happen locally even without the patch... |
This is something I ran into myself recently too, so it's neat to see what git has a feature for this! (running in the background) I'd echo the same concerns as @ehuss though in that it's not clear to me whether we'd function correctly with a gc happening concurrently in the background. I don't really know how the git gc works myself or how roots are determined. Cargo's management of the index is somewhat nonstandard as well since we don't even have a checkout, and I forget if we have branches/refs/etc tracking commits. Would it be possible to run |
It's quite simple to do so: just use |
An alternative would be to implement gc manually with the git_packbuilder_* API from libgit2. |
I'd be fine with either route, I think it'd probably just be best to run in the foreground for now until we're sure that this is an operation which can happen concurrently. |
We discussed this in the Cargo meeting and @joshtriplett spoke about the concurrency aspect and said that libgit2 is highly likely to work since @glandium would you be up for making that change and we can r+ with it running in the background as well? |
Just to be clear, are you saying the patch as-is + a move of the maybe_gc_repo call to after the fetch? |
Yeah, @joshtriplett spoke to how it's highly likely to continue to work given the defensiveness of |
There's an interesting error in the azure tests, where the panic message seems split between stdout and stderr. |
Ah, the current code actually relies on reinitializing the repo entirely when git is not available (and thus git gc), and then fetch, so that the number of packs is reduced. Which doesn't work if git gc happens after fetching. Should we go with something like:
If so, is it okay to add a dependency on the which crate to do the check? Or does cargo use something else? |
Ah right, that's because the function is currently reliant on only running sometimes but it's now being changed to run unconditionally. We don't want to reinitialize the entire repository on each fetch so if git isn't available I think we'll still want some sort of a guard like we have today, but otherwise what you've written down sounds pretty reasonable to me. |
☔ The latest upstream changes (presumably #8363) made this pull request unmergeable. Please resolve the merge conflicts. |
Closing due to inactivity. We're still interested in improving the automatic garbage collection. If you want to continue pursuing this, feel free to reopen or post a new PR. |
Fixes #8195.