-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo is entering an infinite loop #7840
Comments
Could we get this issue pinned until a fix is out in nightly? |
I appear to be getting different behavior, although perhaps somewhat related, I'm not sure. Using @dtolnay's repository I can reproduce with I'm unable to reproduce with a locally-built binary with debuginfo (or without debuginfo). It's also not clear to me what's going on with the infinite memory allocation. Using The backtrace always looks like this from what I can tell:
making me think it's stuck here ... and then it turns out I can reproduce this hang if I build the exact commit of nightly locally. If I build Is it possible there are two "infinite loops" maybe? One on nightly which has been fixed (somehow), which @dtolnay is running into. And another which @ehuss is running into which isn't fixed and is harder to reproduce? |
Ok, I can say for certain now what I am seeing locally. Nightly, as of this red-hot second, is b68b097. Master, however is c326fcb. Notably, this means that nightly does not have #7829. Turns out that PR not only fixes an assertion, but rather an "infinite allocation". Inside of this function the bug on nightly is that sometimes So I can say with confidence what the hang I personally saw was. Now I'd like to clarify.
If it is as I suspect, then there are indeed two hangs, one fixed by #7829 but not deployed to nightlies yet, and another that @ehuss is describing here which looks far more nefarious. |
Yes that's right! Sorry for hijacking the thread. It looks like rust-lang/rust#68579 has already landed the submodule update to pull in #7829 so tomorrow's nightly will have fixed the one I am hitting. Thanks! |
No worries @dtolnay, glad it's already fixed for you in any case :) Oh looks like @ehuss already found the infinite loop with memory in discord yesterday, so I do indeed suspect that this is separate. @ehuss I've been running |
I can try to dig in later today, but if we aren't able to track it down quickly we'll probably want to back the change out (at least from beta Cargo, which I think these patches slipped into?). Once I have a charger around I can try (and don't mind spinning fans up) and reproduce on macOS and Linux as well. |
Ok after leaving running for quite some time I've found a similar hang in Cargo. (using Linux) There are two threads, one of them is the jobserver thread though which is known to be blocked. The other looks like this:
Line numbers are relative to c326fcb As the original author of the I tried to do a bit of debugging in the module itself but honestly this code is so far out of cache and it's so complicated it's probably a lost cause. I would fix this in one of two ways:
|
I believe https://docs.rs/crossbeam-channel/0.4.0/crossbeam_channel/ is the "mpsc" replacement in crossbeam, and it has the same bounded/unbounded API I believe. I think that's probably the better solution rather than trying to get destructor ordering "correct" -- particularly as I believe none of us actually know why std's impl hangs, so it seems likely that other than restoring prior behavior we probably wouldn't be able to make much headway in making sure it's actually correct. |
Yea, sorry for the confusion, there are two (independent) infinite loops. I tried to get the fix in for the progress bar before nightly was built, but rust-lang/rust hit this bug and we lost 4 hours and missed the window. Yea, I think we'll need to back it out on beta. I was hoping to avoid this by delaying the update until after the beta branch, but someone else updated cargo. I'm glad you were able to catch it on linux, I was never able to repro on linux. It is probably very sensitive to timing and threading behavior. I don't think crossbeam_channel would be a good choice as-is. I was testing it a few days ago for #7838, and found the performance to be substantially worse. |
Sounds like you have it under control, but in case it helps here is a sample on osx. |
FWIW Cargo has been updated by rust-lang/rust#68407 in order to include #7826. Beta promotion has some problems rust-lang/rust#68595 so maybe there is still time to fix it. |
Did what @Mark-Simulacrum have suggest, but unsure if it fixes the problem you're seeing. |
I can reproduce this in 100% of the cases with current nightly. Once it hangs it quickly consumes all the memory and gets killed by OOM killer. I can push up the crate with modifications if y'all still need to reproduce it. |
It affects current nightly but the fix will be included in next nightly. |
FWIW I can reliably and quickly reproduce the infinite allocation bug on FreeBSD, just by trying to build almost any crate. Nix, for example, triggers the bug, but libc does not. |
Latest nightly |
We don't need the complexity of most channels since this is not a performance sensitive part of Cargo, nor is it likely to be so any time soon. Coupled with recent bugs (rust-lang#7840) we believe in `std::sync::mpsc`, let's just not use that and use a custom queue type locally which should be amenable to a blocking push soon too.
Swap std::sync::mpsc channel with crossbeam_channel Hoping it closes #7840 r? @Mark-Simulacrum
We don't need the complexity of most channels since this is not a performance sensitive part of Cargo, nor is it likely to be so any time soon. Coupled with recent bugs (rust-lang#7840) we believe in `std::sync::mpsc`, let's just not use that and use a custom queue type locally which should be amenable to a blocking push soon too.
Cargo is randomly entering an infinite loop in some rare cases. Noticed in CI:
I've been able to reproduce it locally by putting Cargo's own test suite into a loop (
while cargo test; do :; done
after about 30 minutes on my machine). It seems to randomly affect different tests. Attaching with a debugger, I see it caught in this loop, where dropping an mpsc receiver seems to be confused.I strongly suspect this is caused by some change in #7731. Perhaps there is some odd interaction between crossbeam's scope() and mpsc? The only relevant change that I see is that
DrainState
is now moved into the crossbeam scope, and thus dropped inside the scope, whereas previously the mpsc channels residing insideJobQueue
were dropped much later.@Mark-Simulacrum Are you very familiar with mpsc or crossbeam's scope? mpsc seems wildly complex, and I don't know where to start trying to figure out what's wrong.
I might try removing
FinishOnDrop
and see if I can repro again, as that seems a little fishy.The text was updated successfully, but these errors were encountered: