-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid or reduce shutdown costs #26
Comments
You are almost certainly right about io-uring vs mmap, since you said close not unmap in the other issue I misunderstood what you were doing. Then it sounds like the fork and letting a background process do the actual work is the best way to go. (That actually sounds like a trick I myself might have use for in a program of mine. It should be more widely publicised as a trick.) |
Focusing on using the fork of a child method to do the linking work, and having the parent exit as soon as the output file is written: I imagine it would be desirable to have a --fork/--nofork command line option to be able to enable/disable this behaviour, combined with a default of "nofork" until proven reliable and no indesireable side-effects, then switch to "fork" be the default after that? Parent should watch for death of a child signal, so that any unexpected exit (panic) by the child, or uncontrolled exit path, would not leave the parent hanging. Any other thoughts on that particular feature? |
Considering options on how to "fork" the sub-process
After looking at them all, I sent a WIP PR using fork(): #192 |
Thanks for working on this @andrewdavidmackenzie. Regarding the |
That's true, it could "exec" itself... |
Nix is also unsafe (for the same reason as libc: fork when multi-threaded is complex). The ni fork appears to be a thin wrapper around libc. It looks to me like the fork crate should be unsafe but isn't. Fishy. It too is a thin wrapper around libc. Either way, I don't think the unsafe is a big issue for wild in this case. The fork will presumably happen as early as possible, at which point the parent process will still be single threaded. After the point of fork, either process can do whatever it wants with threads. I unfortunately have no clear idea into what might be going on with stdio. It should work from a child, as long as stdio isn't locked at the moment of forking. Does wild lock stdio once early and run with it? That might cause issues, it would be better to do that after the fork. |
I imagine Fork crate just has an unsafe {} inside the function, then public function doesn't need to be unsafe. |
Yeah, but I think what @VorpalBlade is questioning is whether it's legitimate for them to mark the function safe. I can see an argument for it being safe. Forking when threads have been started can lead to deadlock. I'm not aware of a way for it to lead to arbitrary undefined behaviour. But maybe there is. If nothing else, it'd be good to have a comment justifying why it isn't marked unsafe. |
I've discussed this before with others. My thought was that if a low-level bit of code is unsafe (e.g It calls some unsafe function in libc), and the "rule" is to mark the function containing it unsafe....then what about the functions calling that unsafe function? If you apply the rule consistently, unsafe will percolate up and you will end up with main() being unsafe. My approach was to make the unsafe scope as small as possible... But, I think it's debatable.... Maybe the rule could be to percolate up to a crate API? |
It's not that anything that uses unsafe should itself be marked as unsafe. It's more that your function should be unsafe unless you can be sure that you're upholding all of the safety requirements no matter how users of your function call it. |
https://www.man7.org/linux/man-pages/man3/fork.3p.html (note the 3p, indicating this is from the POSIX standard). If you search the page for "undefined" there are several hits. Some of them seem to be about atfork handlers, but not all. https://www.man7.org/linux/man-pages/man2/fork.2.html says
This doesn't say that doing otherwise is undefined, but it also doesn't say what would happen if you do break that rule. https://www.man7.org/linux/man-pages/man7/signal-safety.7.html only uses wording like "unpredictable results". (But this is a Linux man page, not a POSIX one) Given that arbitrary signal handlers are generally considered unsafe in Rust, and that this is adjecent to that minefield I'm not comfortable with the idea of just reexposing fork as safe without explaining why it safe in this specific use. That could be "this application is currently single threaded" (which is a decision a library can never make), or "we only execute async signal safe functions after the fork" (which precludes arbitrary caller provided code from being executed). Note that the standard library also take this stance: CommandExt::pre_exec is unsafe. As such I believe the fork library to be unsound. EDIT: See also https://internals.rust-lang.org/t/why-no-fork-in-std-process/13770/4 and the transitivly linked discussions. |
That sounds good, like there are plenty of cases when fork can have
problems, and a library cannot know if that is the case
Related: I see there's now a "standard" for doc comments on a function to
explain it's safety for such cases (if you can indeed control it...)
…On Sun, Nov 24, 2024, 9:29 AM Arvid Norlander ***@***.***> wrote:
https://www.man7.org/linux/man-pages/man3/fork.3p.html (note the 3p,
indicating this is from the POSIX standard). If you search the page for
"undefined" there are several hits. Some of them seem to be about atfork
handlers, but not all.
https://www.man7.org/linux/man-pages/man2/fork.2.html says
After a fork() in a multithreaded program, the child can safely call only
async-signal-safe functions (see signal-safety(7)) until such time as it
calls execve(2).
This doesn't say that doing otherwise is undefined, but it also doesn't
say what would happen if you do break that rule.
https://www.man7.org/linux/man-pages/man7/signal-safety.7.html only uses
wording like "unpredictable results".
Given that arbitrary signal handlers are generally considered unsafe
<https://docs.rs/signal-hook/latest/signal_hook/low_level/fn.register.html>
in Rust, and that this is adjecent to that minefield I'm not comfortable
with the idea of just reexposing fork as safe without explaining why it
safe in this specific use. That could be "this application is currently
single threaded" (which is a decision a library can never make), or "we
only execute async signal safe functions after the fork" (which precludes
arbitrary caller provided code from being executed).
Note that the standard library also take this stance: CommandExt::pre_exec
<https://doc.rust-lang.org/std/os/unix/process/trait.CommandExt.html#tymethod.pre_exec>
is unsafe.
As such I believe the fork library to be unsound.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKF4LBLIW2CQTQ46ZZA33L2CGE6DAVCNFSM6AAAAABMNDRXA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJVHA3TANJTGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Unrelated to this issue, but have you considered any of:
(Sorry to hijack the issue, but those ideas just came up as solutions to that subproblem.)
Originally posted by @VorpalBlade in #15 (comment)
Let's move discussion to a new issue, since I'd like to keep that other issue as a good first issue for people to work on.
I've tried a couple of those things. In particular I've tried unmapping files from separate threads and also just leaving them and letting the OS clean them up on exit. Neither seemed to make any difference.
In the case of using multiple threads, I think the issue is that munmap acquires a per-process kernel lock, so the threads end up waiting for each other anyway.
In the case of just leaving things for shutdown, I'm less sure, but I think process termination causes the kernel to do the same unmapping. So termination ends up taking longer. This is somewhat surprising and also feels like the kind of thing that could change with future changes in the Linux kernel, so should probably be revisited at some point.
I haven't tried using io-uring to munmap files. Is that possible? Given the above results, I'd be surprised if this were a faster way to close files.
I have considered using io-uring for all file IO. It'd be a pretty big change from how things currently work though and I suspect might make performance worse rather than better, although that's just a hunch. My reason for suspecting that, is that AFAIK, io_uring will copy the data, even if it's in cache, whereas mmap can reuse the pages of the cache and only needs to update the page tables of the process.
The forking trick would likely work. I haven't done it as yet, but there's a moderate chance that I will eventually.
The text was updated successfully, but these errors were encountered: