Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle waitpid race condition when SIGCHLD is set to SIG_IGN #57241

Merged
merged 2 commits into from
Feb 4, 2025

Conversation

topolarity
Copy link
Member

The fork() we do here relies on SIGCHLD to make sure that we don't race against the child.

This is easy to see in an embedding application that dynamically links libjulia:

int main(int argc, char *argv[])
{
    signal(SIGCHLD, SIG_IGN);
    void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY);
    return 0;
}

Without this change, this fails with an error message:

Error during libstdcxxprobe in parent process:
waitpid: No child processes

Resolves #57240

@topolarity topolarity requested a review from gbaraldi February 3, 2025 14:41
The `fork()` we do here relies on `SIGCHLD` to make sure that we don't
race against the child. This is easy to reproduce with a simple
embedding application that dynamically links `libjulia`:

```c
int main(int argc, char *argv[])
{
    signal(SIGCHLD, SIG_IGN);
    void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY);
    return 0;
}
```

Without this change, this fails with an error message:
```
Error during libstdcxxprobe in parent process:
waitpid: No child processes
```
perror("Error during libstdcxxprobe in parent process:\nwaitpid");
exit(1);
if (errno == ECHILD) {
// SIGCHLD is set to SIG_IGN or has flag SA_NOCLDWAIT, so the child
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be worked around by e.g. using clone on linux and the raw process APIs on osx (and possibly freebsd) if we want to.

Copy link
Member Author

@topolarity topolarity Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this fork() (or clone, etc.) fairly suspect to begin with TBH

As we try to be embeddable / loadable in larger applications it becomes more likely that (1) libstdc++ has already been loaded and we're too late to do anything, and (2) the address space of the application is large and mutating, so that fork() can become potentially expensive

I'd prefer to try to remove our dependency on libstdc++ for small, simple Julia-exported libraries TBH and get rid of this altogether

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, we should probably just mmap the .so and check its symbol table, rather than trying to do this sandboxed dlopen.

The hard part is emulating the linker search path though..

@topolarity
Copy link
Member Author

I'm still trying to figure out why setting up SIG_BLOCK via pthread_sigmask like we do here is not enough to override the SIG_IGN setup by the embedder.

@Keno
Copy link
Member

Keno commented Feb 3, 2025

I'm still trying to figure out why setting up SIG_BLOCK via pthread_sigmask like we do here is not enough to override the SIG_IGN setup by the embedder.

Signal mask and signal disposition are independent. If the signal is blocked, but the handler is setup, the signal will fire when it is unblocked. If the disposition is set to SIG_IGN, the signal is dropped unconditionally.

@topolarity
Copy link
Member Author

Signal mask and signal disposition are independent.

Ah, I see now. Thanks for the explainer, everything here makes sense then.

@topolarity
Copy link
Member Author

@vtjnash Any objections to this PR? Happy to make this a signal disposition check + more informative error, if you think that's preferable

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not thrilled with supporting broken systems, but I don't see anything inherently wrong with this

@topolarity topolarity added the backport 1.12 Change should be backported to release-1.12 label Feb 4, 2025
@topolarity topolarity merged commit daf865e into JuliaLang:master Feb 4, 2025
8 checks passed
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
…57241)

The `fork()` we do here relies on `SIGCHLD` to make sure that we don't
race against the child.

This is easy to see in an embedding application that dynamically links
`libjulia`:
```c
int main(int argc, char *argv[])
{
    signal(SIGCHLD, SIG_IGN);
    void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY);
    return 0;
}
```

Without this change, this fails with an error message:
```
Error during libstdcxxprobe in parent process:
waitpid: No child processes
```

Resolves #57240

(cherry picked from commit daf865e)
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
…57241)

The `fork()` we do here relies on `SIGCHLD` to make sure that we don't
race against the child.

This is easy to see in an embedding application that dynamically links
`libjulia`:
```c
int main(int argc, char *argv[])
{
    signal(SIGCHLD, SIG_IGN);
    void *handle = dlopen("path/to/libjulia.so", RTLD_LAZY);
    return 0;
}
```

Without this change, this fails with an error message:
```
Error during libstdcxxprobe in parent process:
waitpid: No child processes
```

Resolves #57240

(cherry picked from commit daf865e)
@KristofferC KristofferC mentioned this pull request Feb 6, 2025
32 tasks
KristofferC added a commit that referenced this pull request Feb 13, 2025
Backported PRs:
- [x] #57142 <!-- Add reference to time_ns in time -->
- [x] #57241 <!-- Handle `waitpid` race condition when `SIGCHLD` is set
to `SIG_IGN` -->
- [x] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [x] #57211 <!-- Ensure read/readavailable for BufferStream are
threadsafe -->
- [x] #57262 <!-- edit NEWS for v1.12 -->
- [x] #57226 <!-- cfunction: reimplement, as originally planned, for
reliable performance -->
- [x] #57253 <!-- bpart: Fully switch to partitioned semantics -->
- [x] #57273 <!-- fix "Right arrow autocompletes at line end"
implementation -->
- [x] #57280 <!-- dep: Update JuliaSyntax -->
- [x] #57229 <!-- staticdata: Close data race after backedge insertion
-->
- [x] #57298 <!-- Updating binding version to fix MMTk CI -->
- [x] #57248 <!-- improve concurrency safety for `Compiler.finish!` -->
- [x] #57312 <!-- Profile.print: de-focus sleeping frames as gray -->
- [x] #57289 <!-- Make `OncePerX` subtype `Function` -->
- [x] #57310 <!-- Make ptls allocations at least 128 byte aligned -->
- [x] #57311 <!-- Add a warning for auto-import of types -->
- [x] #57338 <!-- fix typo in Float32 random number generation -->
- [x] #57293 <!-- Fix getfield_tfunc when order or boundscheck is Vararg
-->
- [x] #57349 <!-- docs: fix-up world-age handling for META access -->
- [x] #57344 <!-- Add missing type asserts when taking the queue out of
the task struct -->
- [x] #57348 <!-- 🤖 [master] Bump the SparseArrays stdlib from 212981b
to 72c7cac -->
- [x] #55040 <!-- Allow macrocall as function sig -->
- [x] #57299 <!-- Add missing latestworld after parameterized type alias
-->
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamically loading libjulia with SIGCHLD ignored causes error waitpid: No child processes
4 participants