-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable rustc --version hangs forever #56736
Comments
Another user reported the same issue here: rust-lang/cargo#6384 That is the first release that removed jemalloc. I would suspect that is related, but I don't have any ideas on how to reproduce. |
Here is a stacktrace of the stuck process This was captured with gdb on a stuck process running |
Do you have ESET antivirus installed (or any other security software)? I installed ESET and I'm able to reproduce it. It looks like jemalloc is getting stuck recursively trying to initialize itself. @alexcrichton My guess is that there is something about jemalloc 5 has changed how it initializes maybe? |
Yes, I have ESET antivirus installed. |
cc @gnzlbg, do you know if jemalloc has a fix for this perhaps? |
It appears to be an issue with ESET, jemalloc 5, and rustc being built for an old kernel. It looks like jemalloc 5 has started to use CLOEXEC. Since rust is built with a very old linux kernel, it has to use fcntl (here) instead of just passing O_CLOEXEC (which requires 2.6.23). fcntl is intercepted by ESET which attempts to find the I have confirmed building rustc locally (with jemalloc) that it does not hang, presumably because it is using O_CLOEXEC. I don't offhand see any workarounds (other than using a newer kernel). |
The PR that started using CLOEXEC was jemalloc/jemalloc#872 which fixed jemalloc/jemalloc#528 . We could patch jemalloc to not use CLOEXEC when built for Rust, but... it looks to me that jemalloc is doing the right thing here, and that this is a corner case that should be handled in ESETs side. We should open a bug with ESET about their jemalloc 5 / fcntl / Rust support, maybe they can roll a fix quickly. Depending on their timeline, patching jemalloc to not use CLOEXEC shouldn't be hard: when will the first stable Rust version with this issue land? I think we should consider it a regression. |
Opening a bug (if we can) with ESET sounds good to me for now, but if that doesn't pan out we can probably work around this and just not use cloexec there as it's a short-lived fd anyway |
Something I think deserves clarification here: as of PR #55238 (resolving issue #36963), builds of Its a pretty confusing situation, IMO, since attempts to locally replicate the behavior described here via a local build of (Also: the CI's opting back into using |
IIUC the intent was for |
Yes, I too thought that was the intent. But...: Lines 399 to 401 in f4b07e0
It could very well be that I am wrong about my expectations, and that if one wants to replicate the CI build product, one should take care to actually run Line 33 in f4b07e0
(or with configure args taken from https://github.com/rust-lang/rust/blob/master/appveyor.yml, as appropriate to one's platform) |
I'm just going to open a separate issue about this discrepancy between the CI vs local builds, rather than continuing to clutter up this issue's comment thread. Sorry for the noise! |
T-compiler triage. This issue is tagged as a regression but no T-label, so no team has default responsibility for it. Based on the comments in the issue here, I do not think T-compiler is in a position to fix this; it seems to be probably a T-infra problem? (And a problem that T-infra may well choose to close as "wont-fix") |
FWIW, we started getting this after updating to 1.31 on Ubuntu 14.04.5 LTS, but we are not running ESET. So far it's only only one one instance in our AWS stack, but that instance is responsible for a whole feature set in our beta environment. We've tried 1.31.0 and 1.31.1 so far. Also have reinstalled rustup + rust. So far the behavior is pretty consistent. We don't, however get this on the same feature set's staging instance, so we're trying to track down the differences. Hopefully we'll find something as this is currently hanging up the dev+QA cycle right before a scheduled app release. |
As of Rust 1.32 (Jan 17th) this now affects the latest Rust stable version. |
We discussed this in the infra team meeting a few weeks, and basically decided (given that this appears a strange interaction between jemalloc and ESET we're stuck in the middle of) to wait until beta (and stable!) to see if more people reported the issue. Given that we've not seen more reports, unfortunately this isn't going to be something we prioritise - our hope is that either jemalloc or ESET fixes things. That said - @turboladen, we're interested in your report. Did you manage to track anything down, e.g. via strace? |
@aidanhs unfortunately we didn't really get time to get much info on it. We ended up cloning over the AWS image we had for our same app from a different environment (beta was having the problem described in this ticket, staging was not); I do believe, however, that we had jemalloc installed on the beta instance (trying to help speed up Rails), but did not have jemalloc on the staging instance. |
Just FYI, ESET has fixed the issue in version |
starting with
nightly-2018-11-04
and anything later, just checking the version or doing anything with rustc causes the process to hang forever. It seems like it is waiting on a lock (no CPU usage)Stable rust works fine, and well as any nightly version before this one.
OS is
Ubuntu 18.04.1 LTS
Rust stable/nightly versions were installed with rustup
The last few lines from
strace rustc +nightly-2018-11-04 --version
areIt hangs on the
FUTEX_WAIT
call foreverThe text was updated successfully, but these errors were encountered: