Skip to content

Cargo publish fails with 'Too many open files' #4403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kureuil opened this issue Aug 13, 2017 · 5 comments · Fixed by #4478
Closed

Cargo publish fails with 'Too many open files' #4403

kureuil opened this issue Aug 13, 2017 · 5 comments · Fixed by #4478
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-publish

Comments

@kureuil
Copy link
Contributor

kureuil commented Aug 13, 2017

While trying to publish a crate to a local instance of crates.io, cargo fails with a 'Too many open files' error while creating its lock file:

$ cargo publish --host http://localhost:8888/git/index
    Updating registry `http://localhost:8888/git/index`
warning: spurious network error (2 tries remaining): [2/-1] failed to create locked file '/home/lperson/.cargo/registry/index/localhost-4f8d9a62b31a03f2/.git/objects/pack/pack_git2_Xiidx.lock': Too many open files
warning: spurious network error (1 tries remaining): [2/-1] failed to create locked file '/home/lperson/.cargo/registry/index/localhost-4f8d9a62b31a03f2/.git/objects/pack/pack_git2_Aoidx.lock': Too many open files
error: failed to update registry http://localhost:8888/git/index

Caused by:
  failed to fetch `http://localhost:8888/git/index`

Caused by:
  [2/-1] failed to create locked file '/home/lperson/.cargo/registry/index/localhost-4f8d9a62b31a03f2/.git/objects/pack/pack_git2_PPidx.lock': Too many open files

To put things into context, I have approximatively a thousand version of a single crate published in my local registry (maybe not a legit use-case per se, but I still think the problem is worth an issue).
By running the program with strace, we can see that cargo tries to open every git pack file in the registry:

open("/home/lperson/.cargo/registry/index/localhost-4f8d9a62b31a03f2/.git/objects/pack/pack-280d79e0d7e028bbd92b132a35b6aa2d5d862c91.pack", O_RDONLY) = 1022

One working wordaround is updating the file descriptor limit with ulimit to more than 1024.

I'm not familiar enough with cargo to know why it would need to have all these git pack files open at the same time. Is this an expected behaviour of cargo ? Would this happen with 1 published version for 1000 crates ?

Complete strace -e open,close -f'd cargo log:
https://gist.github.com/kureuil/989b1f4c6d283384c8b70fdd1ddf2e87

@alexcrichton
Copy link
Member

Oh dear, this is definitely not expected! I'm not really sure what's going on here. Would it be possible to get stack traces of Cargo when it starts opening files it's not closing? I'm curoius what API we're calling in libgit2 that causes this...

@onur
Copy link
Member

onur commented Aug 14, 2017

This is similar to https://github.com/onur/docs.rs/issues/76. I've temporary solved this problem with running git gc on the repository.

@carols10cents carols10cents added A-git Area: anything dealing with git C-bug Category: bug Command-publish labels Aug 27, 2017
@alexcrichton
Copy link
Member

This is probably caused by libgit2/libgit2#2758

@alexcrichton
Copy link
Member

If we want to fix this, which I think we should, I think it'd look something like --

On the "slow path" of doing a git fetch, just before we do that we should check how many pack files are in the repository. Above some threshold we "do a gc" by blowing away the entire index. That means that sporadically an update could take awhile but it never would on the "fast path" and otherwise it'd help keep the repo in a relatively pristine state.

@alexcrichton
Copy link
Member

Further discussion on #libgit2 --

  • We should count pack files in .git/objects/pack/*
  • Before blowing away the repo we should try a git gc (the command line tool)
  • Failing that, blowing away seems fine

alexcrichton added a commit to alexcrichton/cargo that referenced this issue Sep 11, 2017
This commit is targeted at improving the long-term management of git checkouts
and git repositories. Currently every time data is fetched from crates.io
libgit2 will create a new pack file in the repository. These pack files
accumulate over time and end up causing pathological behavior if there's lots of
them, causing libgit2 to open many file descriptors all at once, possibly
blowing the system's file descriptor limits.

To alleviate this problem you typically run `git gc`, but libgit2 doesn't have
this implemented. Instead what Cargo now does is detect this situation and run
literally the command line tool `git gc` in a best-effort attempt to compact the
repo. Failing that, for example when git isn't installed, Cargo will remove the
entire repo and do a full checkout again.

At the same time this commit also generalizes this logic, plus the existing fast
path github logic, to all git repositories and not just the index. That way all
git repositories can benefit from the "github fast path" as well as the
compaction steps.

Closes rust-lang#4403
bors added a commit that referenced this issue Sep 11, 2017
Periodically gc repos in Cargo

This commit is targeted at improving the long-term management of git checkouts
and git repositories. Currently every time data is fetched from crates.io
libgit2 will create a new pack file in the repository. These pack files
accumulate over time and end up causing pathological behavior if there's lots of
them, causing libgit2 to open many file descriptors all at once, possibly
blowing the system's file descriptor limits.

To alleviate this problem you typically run `git gc`, but libgit2 doesn't have
this implemented. Instead what Cargo now does is detect this situation and run
literally the command line tool `git gc` in a best-effort attempt to compact the
repo. Failing that, for example when git isn't installed, Cargo will remove the
entire repo and do a full checkout again.

At the same time this commit also generalizes this logic, plus the existing fast
path github logic, to all git repositories and not just the index. That way all
git repositories can benefit from the "github fast path" as well as the
compaction steps.

Closes #4403
alexcrichton added a commit to alexcrichton/cargo that referenced this issue Sep 14, 2017
This commit is targeted at improving the long-term management of git checkouts
and git repositories. Currently every time data is fetched from crates.io
libgit2 will create a new pack file in the repository. These pack files
accumulate over time and end up causing pathological behavior if there's lots of
them, causing libgit2 to open many file descriptors all at once, possibly
blowing the system's file descriptor limits.

To alleviate this problem you typically run `git gc`, but libgit2 doesn't have
this implemented. Instead what Cargo now does is detect this situation and run
literally the command line tool `git gc` in a best-effort attempt to compact the
repo. Failing that, for example when git isn't installed, Cargo will remove the
entire repo and do a full checkout again.

At the same time this commit also generalizes this logic, plus the existing fast
path github logic, to all git repositories and not just the index. That way all
git repositories can benefit from the "github fast path" as well as the
compaction steps.

Closes rust-lang#4403
bors added a commit that referenced this issue Sep 14, 2017
Periodically gc repos in Cargo

This commit is targeted at improving the long-term management of git checkouts
and git repositories. Currently every time data is fetched from crates.io
libgit2 will create a new pack file in the repository. These pack files
accumulate over time and end up causing pathological behavior if there's lots of
them, causing libgit2 to open many file descriptors all at once, possibly
blowing the system's file descriptor limits.

To alleviate this problem you typically run `git gc`, but libgit2 doesn't have
this implemented. Instead what Cargo now does is detect this situation and run
literally the command line tool `git gc` in a best-effort attempt to compact the
repo. Failing that, for example when git isn't installed, Cargo will remove the
entire repo and do a full checkout again.

At the same time this commit also generalizes this logic, plus the existing fast
path github logic, to all git repositories and not just the index. That way all
git repositories can benefit from the "github fast path" as well as the
compaction steps.

Closes #4403
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-publish
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants