-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle waitpid
race condition when SIGCHLD
is set to SIG_IGN
#57241
Conversation
The `fork()` we do here relies on `SIGCHLD` to make sure that we don't race against the child. This is easy to reproduce with a simple embedding application that dynamically links `libjulia`: ```c int main(int argc, char *argv[]) { signal(SIGCHLD, SIG_IGN); void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY); return 0; } ``` Without this change, this fails with an error message: ``` Error during libstdcxxprobe in parent process: waitpid: No child processes ```
7c76f70
to
ead5f86
Compare
perror("Error during libstdcxxprobe in parent process:\nwaitpid"); | ||
exit(1); | ||
if (errno == ECHILD) { | ||
// SIGCHLD is set to SIG_IGN or has flag SA_NOCLDWAIT, so the child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be worked around by e.g. using clone on linux and the raw process APIs on osx (and possibly freebsd) if we want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this fork()
(or clone, etc.) fairly suspect to begin with TBH
As we try to be embeddable / loadable in larger applications it becomes more likely that (1) libstdc++
has already been loaded and we're too late to do anything, and (2) the address space of the application is large and mutating, so that fork()
can become potentially expensive
I'd prefer to try to remove our dependency on libstdc++
for small, simple Julia-exported libraries TBH and get rid of this altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, we should probably just mmap
the .so
and check its symbol table, rather than trying to do this sandboxed dlopen
.
The hard part is emulating the linker search path though..
I'm still trying to figure out why setting up |
Signal mask and signal disposition are independent. If the signal is blocked, but the handler is setup, the signal will fire when it is unblocked. If the disposition is set to SIG_IGN, the signal is dropped unconditionally. |
Ah, I see now. Thanks for the explainer, everything here makes sense then. |
To check for regressions of JuliaLang#57240
ead5f86
to
0ea3f84
Compare
@vtjnash Any objections to this PR? Happy to make this a signal disposition check + more informative error, if you think that's preferable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not thrilled with supporting broken systems, but I don't see anything inherently wrong with this
…57241) The `fork()` we do here relies on `SIGCHLD` to make sure that we don't race against the child. This is easy to see in an embedding application that dynamically links `libjulia`: ```c int main(int argc, char *argv[]) { signal(SIGCHLD, SIG_IGN); void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY); return 0; } ``` Without this change, this fails with an error message: ``` Error during libstdcxxprobe in parent process: waitpid: No child processes ``` Resolves #57240 (cherry picked from commit daf865e)
…57241) The `fork()` we do here relies on `SIGCHLD` to make sure that we don't race against the child. This is easy to see in an embedding application that dynamically links `libjulia`: ```c int main(int argc, char *argv[]) { signal(SIGCHLD, SIG_IGN); void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY); return 0; } ``` Without this change, this fails with an error message: ``` Error during libstdcxxprobe in parent process: waitpid: No child processes ``` Resolves #57240 (cherry picked from commit daf865e)
Backported PRs: - [x] #57142 <!-- Add reference to time_ns in time --> - [x] #57241 <!-- Handle `waitpid` race condition when `SIGCHLD` is set to `SIG_IGN` --> - [x] #57249 <!-- restore non-freebsd-unix fix for profiling --> - [x] #57211 <!-- Ensure read/readavailable for BufferStream are threadsafe --> - [x] #57262 <!-- edit NEWS for v1.12 --> - [x] #57226 <!-- cfunction: reimplement, as originally planned, for reliable performance --> - [x] #57253 <!-- bpart: Fully switch to partitioned semantics --> - [x] #57273 <!-- fix "Right arrow autocompletes at line end" implementation --> - [x] #57280 <!-- dep: Update JuliaSyntax --> - [x] #57229 <!-- staticdata: Close data race after backedge insertion --> - [x] #57298 <!-- Updating binding version to fix MMTk CI --> - [x] #57248 <!-- improve concurrency safety for `Compiler.finish!` --> - [x] #57312 <!-- Profile.print: de-focus sleeping frames as gray --> - [x] #57289 <!-- Make `OncePerX` subtype `Function` --> - [x] #57310 <!-- Make ptls allocations at least 128 byte aligned --> - [x] #57311 <!-- Add a warning for auto-import of types --> - [x] #57338 <!-- fix typo in Float32 random number generation --> - [x] #57293 <!-- Fix getfield_tfunc when order or boundscheck is Vararg --> - [x] #57349 <!-- docs: fix-up world-age handling for META access --> - [x] #57344 <!-- Add missing type asserts when taking the queue out of the task struct --> - [x] #57348 <!-- 🤖 [master] Bump the SparseArrays stdlib from 212981b to 72c7cac --> - [x] #55040 <!-- Allow macrocall as function sig --> - [x] #57299 <!-- Add missing latestworld after parameterized type alias -->
The
fork()
we do here relies onSIGCHLD
to make sure that we don't race against the child.This is easy to see in an embedding application that dynamically links
libjulia
:Without this change, this fails with an error message:
Resolves #57240