-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded fork appears flaky on OSX #14232
Comments
It should really just use |
@alexcrichton with the runtime changes, is this still relevant? |
Unfortunately, yes. |
I... don't think there's anything we can do about this, so I think it's just something we're going to have to live with unfortunately. |
Intentionally providing broken functionality rather than using |
I'm uncomfortable just closing this outright without some further discussion; I'm going to re-open as needs-decision. |
triage: P-backcompat-libs (1.0 beta) |
triage: P-backcompat-libs (1.0) |
nominating for discussion at next week's triage. |
There are many open questions here, and many ways that we might try to resolve or at least work around this in future versions of the stdlib. One fundamental issue is that there are some usage patterns supported by separated We should however:
The former bullet need not happen for 1.0, but the latter bullet definitely should happen for 1.0. |
(In particular, |
assigning to @aturon to follow through on the 1.0 parts. |
FWIW: I modified the test program to call |
@rprichard fascinating! I have also been able to reproduce that, but I haven't gotten the assertion to trigger yet, it just deadlocks when the child doesn't receive the signal. I'm getting more and more suspicious of this over time... |
Hopefully this is a bug that can be fixed in the fullness of time, because the API is not changing now. |
In the notes from triage, I noted that this should happen for 1.0:
Has this documentation actually happened? Should we open a separate issue? |
The documentation has not changed yet, but it's not something I'd want to block the release on, so I'm not sure a new bug is worth it (it would still be nice to do though) |
I can reproduce the gist of @rprichard's results on an iMac13,2 ("27-inch, Late 2012") running 10.10.3: if I replace main with Also, if I replace the Looking at the xnu kernel source for If I add a 1-millisecond sleep in the parent, the original version of the program (threads and Can we rename this issue to be something more like "signals immediately after process creation are racy on OS X", since it's not strictly related to multiple threads? I think the problem only appears worse there because more cores are used, which exercises races in the kernel more.... |
Wow, thanks for the thorough investigation @geofft! I think at this point it's pretty clear that there's not a whole lot we can do about this (if it's a kernel bug of some form), beyond your sleeping suggestion, but unfortunately even that wouldn't be rock-solid. I'm going to nominate this bug for closure at this point, however, as the prospects for solving it seem to be getting bleaker every day. |
This concern of mine (from an earlier comment) has not yet been addressed, AFAICT:
I would argue against closing this until that's addressed. |
The thinking earlier in the thread was that this could be resolved by switching to a more reliable OS X API, but as far as I can tell, no such API exists: this is a fundamental bug in how OS X signal delivery works, regardless of whether you're using We should probably report this to Apple, though. I could do that. |
I'm curious why a bug that supposedly only occurs when a new process is killed near-instantly after starting up is showing up as an issue in practice (e.g. on the bots) - why are you starting a process if you don't want it to do anything? Anyway, if it's a child process, a |
The investigation into rust-lang#14232 discovered that it's possible that signal delivery to a newly spawned process is racy on OSX. This test has been failing spuriously on the OSX bots for some time now, so ignore it as we don't currently know a solution and it looks like it may be out of our control.
Can someone help me out with what exactly needs to be documented here, and what our plans are? It's been a long time since the discussion happened, and I don't want something incorrect. |
OK, I'm going to close this in favor of #27537. With @geofft's investigation it's becoming clearer that there's basically not much we can do here, so the "documentation bug" should just be documenting what we're doing. I don't really want to add a clause to As a result, this is basically not-a-bug and documentation is covered by #27537 |
The investigation into #14232 discovered that it's possible that signal delivery to a newly spawned process is racy on OSX. This test has been failing spuriously on the OSX bots for some time now, so ignore it as we don't currently know a solution and it looks like it may be out of our control.
MIR episode 2 This PR adds: 1. `need-mut` and `unused-mut` diagnostics 2. `View mir` command which shows MIR for the body under cursor, useful for debugging 3. MIR lowering for or-patterns and for-loops
Cleans up some changes from rust-lang/rust-clippy#11421 I searched for any `.stderr` files where the number of errors changed and reverted + manually added the annotations for them Also fixes `tests/ui/asm_syntax_not_x86.rs` r? @flip1995 changelog: none
In the following program, a number of threads are made, and then each thread forks of a child that sleeps forever and then immediately kills it. I would expect this program to succeed continuously, but it wedges on OSX occasionally, reporting a successful signal delivery, but failing to actually deliver the signal apparently.
This is essentially how we fork() in libnative, and it's how we're using fork from libgreen. Trying to investigate a solution to this, but I'm starting to think that multithreaded fork is just fundamentally broken on basically all platforms except linux.
This issue has appeared as various forms of flakiness on the bots, which is why I started investigating.
The text was updated successfully, but these errors were encountered: