-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how os/spawn
should be cleaned up or handle zombie processes...
#1386
Comments
I remember there was a flag for not collecting processes, turning them into zombies. Also, a quick search showed that there is a configuration for that. |
Perhaps the following is not what you were thinking of, but this bit from
Since
it would seem the intention is for Further, from the docstrings, it doesn't seem like On a related note, these lines seemed to be about garbage collection and dealing with zombies: static int janet_proc_gc(void *p, size_t s) {
(void) s;
JanetProc *proc = (JanetProc *) p;
#ifdef JANET_WINDOWS
if (!(proc->flags & JANET_PROC_CLOSED)) {
if (!(proc->flags & JANET_PROC_ALLOW_ZOMBIE)) {
TerminateProcess(proc->pHandle, 1);
}
CloseHandle(proc->pHandle);
CloseHandle(proc->tHandle);
}
#else
if (!(proc->flags & (JANET_PROC_WAITED | JANET_PROC_ALLOW_ZOMBIE))) {
/* Kill and wait to prevent zombies */
kill(proc->pid, SIGKILL);
int status;
if (!(proc->flags & JANET_PROC_WAITING)) {
waitpid(proc->pid, &status, 0);
}
}
#endif
return 0;
} Also, there is this line: int pipe_owner_flags = (is_spawn && (flags & 0x8)) ? JANET_PROC_ALLOW_ZOMBIE : 0; and this line: proc->flags = pipe_owner_flags; in the underlying C implementation of |
Perhaps, IO pipe is preventing |
I'm not seeing pipe closure. |
I'm trying to reproduce based on modifying the code samples in the first post above and not having much luck. I replaced I presume that "the spawned process ends up devouring a CPU core" means something like CPU utilization goes to 100%. That's not something I'm seeing here, so I'm guessing that I'm not using appropriate code. Though if it's something else, please let me know. Would you mind posting code that demonstrates the issue - preferably with programs that are likely to be installed on a typical Linux machine? Possibly if I can reproduce the issue here there's a chance I might be able to observe what actually happens using |
Those are minimal examples that reproduce 100% CPU core usage. (defn dump
[]
(let [proc (os/spawn ["ls"] :p {:out :pipe :err :pipe})]
(print (ev/read (proc :out) :all nil 5))))
(dump)
(forever
(ev/sleep 1)) (defn dump
[]
(let [proc (os/spawn ["ls"] :p {:out :pipe :err :pipe})]
(os/proc-wait proc)))
(dump)
(forever
(ev/sleep 1)) (defn dump
[]
(let [proc (os/spawn ["ls"] :p {:out :pipe :err :pipe})]
(ev/with-deadline 5
(os/proc-wait proc))))
(dump)
(forever
(ev/sleep 1)) If you replace I think the combination of |
(os/spawn ["ls"] :p {:out :pipe})
(forever
(ev/sleep 1)) leads to 100% cpu core usage. (os/spawn ["sleep" "100000000"] :p {:out :pipe})
(forever
(ev/sleep 1)) does not. (os/spawn ["ls"] :p)
(forever
(ev/sleep 1)) does not, either. |
My conclusion is that Because |
Will take more of a look at this soon, but this looks to be more related to the event loop not handling and event that keeps trigger, resulting in a busy loop. Namely, ls writes data that you never read, while sleep does nothing so there is no data to read. Closing stuff has no relevance here. |
Using 'strace janet test.janet' should better illustrate what is happening to cause a busy loop |
Here it is. strace.txt |
(def proc (os/spawn ["ls"] :p {:out :pipe}))
(print (ev/read (proc :out) :all))
(forever
(ev/sleep 1)) also leads to 100% cpu core usage. To remove a busy loop, replace |
I see that you are using poll instead of epoll, which is a nonstandard build on Linux. Using epoll should help here - how did you build Janet and what version are you using? |
I'm not using |
By the way, is garbage collector going to correctly deal with this? (def proc (os/spawn ["ls"] :p {:out :pipe}))
(print (ev/read (proc :out) :all))
(forever
(ev/sleep 1)) or (os/spawn ["ls"] :p {:out :pipe})
(forever
(ev/sleep 1)) I just want to use |
It seems like it, I can see in the strace log. It's a configure build option that for a while was accidentally a default with the meson build. It should show epoll_wait in a loop instead of poll,poll,poll,etc. |
Your comment about using is/spawn to just run something and forget about it is just not how it works, for a number of reasons. And using :pipe as /dev/null is also a bad idea. After the pipe buffer fills up, your program will hang. If you want to run and forget, use os/execute. If you want it to wait in the background, wrap it with ev/spawn. |
But, Janet doesn't come with
|
You can open dev null and use that, or read and discard in a loop. There are some utils here that illustrate it: https://github.com/janet-lang/spork/blob/master/spork/sh.janet Currently having internet trouble so I'm on a phone, but will look into the busy cpu loops later. That is certainly a bug with the poll backend. To use epoll, rebuild Janet with meson and -Depoll=true and see if that helps. EDIT: I think the exec-slurp-all function might do what you want |
I think the following commands fixed 100% cpu core usage.
Perhaps, |
Neither Perhaps, is it okay to not close proc or IO pipe streams?
Then, |
According to my test
led to 100% cpu usage. However, this didn't.
If I call |
I also found a weird thing. I discovered that this is possible.
|
os/execute doesn't work with files - note that devnull uses os/open rather than file/ open. These are in Janet what are called Streams and are just wrappers around Unix file descriptors. It corresponds to calling open() in C. |
Os/execute takes all of the same arguments as os/spawn |
I discovered that |
Tell me which of the following items will lead to problems.
|
This is an issue tracker, not a chatroom. I think we are getting off topic and I feel like I'm fielding random questions in a thread. Are you trying to fix your problem or understand what is going on with each example program? It's hard to track things when you have posted like 20 different variations of a program just a short period. Please have a goal in mind. The 100% cpu usage is certainly an issue but most likely mostly related to the event loop implementation and poll. In either case, your program is not ideal. A few things:
For your example, I would really do something like (sh/exec-slurp-all "my-program" "arg1") or just use (os/execute). os/spawn is mean t for long running subprocess thag you want to monitor and interact with, for example piping data to the processes stdin. |
The issue is confusion around how
Why would I want to always use In my j3blocks cmd module, if |
I'm not putting all of the nuance here into the doctoring for os/spawn.
Nothing to do with the garbage collector. I don't know why you are so focused on the garbage collector, frankly it's not relevant to any of this. It will make a best effort to clean up resources but you don't really know when it will run. The reason you call os/proc-wait is to avoid zombies. Same as any scripting language - if you want more info on this, read the man pages for waitpid(2). Also notice how in sh.janet, is/proc-wait and ev/read run in parallel. As far as race conditions, I was mainly talking about the general case - depending on what program you run, some things will work, some won't. Programs like 'sed' that incrementally read from stdin and then output text in no particular manner can do this quite easily. There are a number of other bugs in the issue tracker where we figured this stuff out and made things work reliably with the patterns in sh.janet. As for "why" it works like this, the answer is simply because it's how POSIX works. os/spawn corresponds to posix_spawn and os/proc-wait corresponds to waitpid. So is that enough? I think there are a couple of solutions here:
|
As I was writing my own code, I discovered that So, I have a few questions.
If I have answers to those questions, then I think I or someone else can maybe submit a pull request to improve documentation. The current documentation for |
Yes.
I'm not sure where you get this idea. Noting is being "stalled". And what is wrong with waiting until the process is complete? Ev/read doesn't block anything if you wrap it with ev/spawn, you can run it on its own fiber. That is what ev/gather does. The general issue is that a subprocess that is writing to a pipe will get stuck if the pipe is not emptied and fills up. So anytime you redirect :out or :err to a pipe and don't read it, you can get this hanging issue. Other than that, the race condition I was originally thinking about has more to do with when there is also something pipe to stdin of the subprocess. I don't think it applies here if you just want the output. So if you call proc-wait and only after try to read the output, that generally won't work. Here is some more example code that sets up and handles a long running sub process. Different from your use case I think, but has similar structure: https://github.com/janet-lang/spork/blob/7a4eff4bfb9486a6c6079ee8bb12e6789cce4564/spork/tasker.janet#L98 As far as actually being race conditions in your code, I don't know. I'm just trying to caution you since you seem to be unsure. |
My j3blocks module has to respond to each line from a wireplumber script which can run for hours. If it had to wait for the wireplumber script to exit, my swaybar would not show me updates to pipewire nodes for hours. I want pipewire updates to be shown to me immediately. Because I want to read lines from a wireplumber script, I am forced to call
First of all, exec-slurp and exec-slurp-all return strings. They don't return pipe streams that can be fed to stdin of another subprocess. If you give stdout pipe of a subprocess to stdin of another subprocess, then you are not going to call I think calling You eliminate this calculation by calling them together in If you know what you are doing, then you are not required to use |
I'm think I'm ready to submit a pull request for this issue. As you said, I didn't need threads in most cases. Now, my j3blocks program uses Further discussion will happen in pull request. |
After days of working with janet subprocess API, I finally understood. I'm now a subprocess master. Because the pipe buffer is limited, if the subprocess output is huge, calling I think this should be documented in |
There seems to be a fair number of noteworthy tidbits in this issue. May be we can make homes for some of them. As for specifics, some of what's in this comment (may be there is some overlap with what's mentioned immediately above the present comment) might be nice to have somewhere too. |
The janet website can have a section about subprocess management, but I improved documentation on subprocess API. |
Perhaps a page for subprocess management between The Event Loop page and the Multithreading page could work. I'm ok to participate in creating one if that's a good path forward. I don't think I understand all of the relevant details though, so likely I'll need to get up to speed on various things. I'll make an issue at the janet-lang.org repository about creating such a page. |
The pull request above explains details you need to know. |
Thanks for pointing that out. I've included mention of it in the newly created issue. |
For now, I discovered a few tricks for
os/spawn
as below.If
os/spawn
ed proc is not closed withwith
, the spawned process ends up devouring a CPU core. This behavior is either intended or a bug. If this behavior is intended,(doc os/spawn)
should mention thatos/spawn
ed processes should be closed with:close
,with
, oros/proc-close
.To prevent zombie processes from blocking
ev/read
oros/proc-wait
, I must specify a timeout throughev/read
orev/with-deadline
. This is not documented somewhere. Ideally, a way to deal with zombie processes should be documented in the janet core.The text was updated successfully, but these errors were encountered: