avoid leaving a zombie process for --onready #462

grawity · 2023-10-04T15:30:59Z

subprocess.Popen() needs to be either used with with: or have .wait() called or be destroyed using del – otherwise the child process stays as a zombie process after exiting, until the object is GC'd.

subprocess.run() does the necessary .wait() automatically. (It's a convenience wrapper available from Python 3.6.)

Since shell=True is used and none of the subprocess-specific features are used, it might be even simpler to use os.system() (available everywhere).

Popen() needs to be used with 'with' or have .wait() called or be destroyed, otherwise there is a zombie child that sticks around until the object is GC'd.

LostRuins · 2023-10-05T02:51:30Z

Actually, I don't think this is what you'd want. Subprocess.run requires that the child process has finished executing before returning control back to the script - that is - it is blocking. If the process you started is a persistent one (e.g. Cloudflare Tunnel) then control will never be returned to the main script.

Also, the retention of the process past the life of the server can be desired. If the user used --onready to launch a browser, for example, you don't want to kill the browser process just because the server exited. Rather if the target is well behaved, it should complete and exit on it's own, effectively the same as if you were running it as a command in the shell.

grawity · 2023-10-05T04:35:51Z

Well, since this is specifically what I want – to run a short-lived command synchronously – then how about having two similar options, --onready and a new --onreadywait?

It's probably possible to solve this using signal to handle SIGCHLD but that seems risky in python.

The specific command I want to run is systemd-notify --ready, so I could also add direct Type=notify support using socket module instead, without any side effects.

LostRuins · 2023-10-05T12:41:57Z

Umm help me to understand though, why would you want the launched process to block the main script? Because if that happens, the server will not be able to handle any incoming requests until the script has finished executing, as that process is blocked. So from the perspective of the new subprocess it will be as if the loading has not yet finished.

If you want a script to run before loading is complete, you should execute it before calling the koboldcpp.py script. If you want the server to be functioning when you invoke your --onready process, then it needs to be running independently and not be blocked (i.e. the onready cannot block it).

In the case of systemd-notify --ready, that process will trigger simultaneously with the server start, run, cleanup itself and exit.

grawity · 2023-10-05T13:36:07Z

Umm help me to understand though, why would you want the launched process to block the main script

Well, no, I misspoke, it doesn't need to block necessarily. But 1) the process takes only like a millisecond to run and immediately exits, and 2) I do want it to be "cleaned up" (i.e. be able to exit without leaving a zombie process permanently) – and probably the easiest way to achieve the latter with Python subprocess is to just run it in a blocking manner. (At least as far I can see, anyway.)

Since the command takes just a few ms to run (and the HTTP socket is already listening at that point so the OS can begin queueing connections), that wouldn't really cause any problems for operation in my case.

But like I said, I can do an alternative using Python sockets instead (it's about 5 lines of code) – which honestly would be the preferred method anyway, I just wasn't sure how much Linux-specific code is okay to add.

If you want a script to run before loading is complete,

No, the purpose of the command is precisely to inform that loading is already complete (i.e. that the service is "ready").

LostRuins · 2023-10-05T15:52:09Z

I don't really get the logic behind "cleaned up" though. When you start a new process, there are only 2 possibilities.

The process runs, completes, and then exits
The process runs and stays running background

If we start the subprocess and do not wait for it (non-blocking):

in case 1, server is started at the same time as the subprocess, the subprocess runs, completes and then exits
in case 2, server is started at the same time as the subprocess, the subprocess runs and stays running in the background

If we start the subprocess and wait for it (blocking):

in case 1, the subprocess runs, completes and then exits, then server is started
in case 2, the subprocess runs and stays running background, and the server never gets to start

So if your subprocess was going to exit normally, then starting it asynchronously is fine too since it won't persist.
And if your subprocess was going to persist, then starting it synchronously just leads to the server never getting to start.

Or to put it more simply:
does systemd-notify --ready run and stay running? Then subprocess.run() will stall the script.
does systemd-notify --ready run, complete and exit? Then subprocess.popen() will not leave a zombie process.

grawity · 2023-10-05T17:19:15Z

I don't really get the logic behind "cleaned up" though. When you start a new process, there are only 2 possibilities.

The process runs, completes, and then exits

The process runs and stays running background

[...]

So if your subprocess was going to exit normally, then starting it asynchronously is fine too since it won't persist.

I'm not sure about Windows, but there's a bit more to it on other systems (be it Linux or macOS). When a process "and then exits" it doesn't quite disappear just yet – most of its resources are indeed released, but the process entry remains and its parent must "collect" or "reap" it by calling waitpid() or similar to retrieve its exit code.

Until that's done, the process entry lingers as a "zombie" in the process list, e.g. in ps axf. (Yes, that's literally what the official term is.) It doesn't consume any resources anymore once it's in that state, so with just one process it's just that it irks me in principle to see it... but if you were to run more, they'd accumulate.

(Specifically, it irks me because it's not normal to see a zombie process stick around for long – it immediately makes me think "oh no, the parent is hung/frozen/stuck". Especially if they begin to accumulate underneath init, that means a reboot is imminent.)

So in Python, if a Popen() object is created and never has .wait() called on it, the child process will keep showing up in the process list even though it has already exited (at least until Python garbage-collects the Popen object).

Usually if a program wants to run some subprocess in a non-blocking manner, it needs to react to SIGCHLD signals that it receives every time a direct child exits and call wait() to collect it. Most event loops have handling for that, and you could do it with Python's signal module... but since this is Python joined with highly multithreaded C, I don't feel confident nearly enough to begin touching that area.

LostRuins · 2023-10-06T01:33:12Z

Okay in that case I have a better solution. How about I move the onready code to a separate thread.

That way I can use subprocess.run or subprocess.popen + wait and it won't interrupt the server itself.

Edit: turns out i was already doing this in a new thread. so it wasnt an issue - i just added a wait()

in fact then I think using run directly would be fine too, as it wont block the main thread

avoid leaving a zombie process for --onready

3fd0187

Popen() needs to be used with 'with' or have .wait() called or be destroyed, otherwise there is a zombie child that sticks around until the object is GC'd.

LostRuins added the invalid This doesn't seem right label Oct 5, 2023

LostRuins approved these changes Oct 6, 2023

View reviewed changes

LostRuins merged commit 9d0dd7a into LostRuins:concedo Oct 6, 2023

LostRuins removed the invalid This doesn't seem right label Oct 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid leaving a zombie process for --onready #462

avoid leaving a zombie process for --onready #462

grawity commented Oct 4, 2023 •

edited

Loading

LostRuins commented Oct 5, 2023

grawity commented Oct 5, 2023

LostRuins commented Oct 5, 2023

grawity commented Oct 5, 2023

LostRuins commented Oct 5, 2023 •

edited

Loading

grawity commented Oct 5, 2023 •

edited

Loading

LostRuins commented Oct 6, 2023 •

edited

Loading

avoid leaving a zombie process for --onready #462

avoid leaving a zombie process for --onready #462

Conversation

grawity commented Oct 4, 2023 • edited Loading

LostRuins commented Oct 5, 2023

grawity commented Oct 5, 2023

LostRuins commented Oct 5, 2023

grawity commented Oct 5, 2023

LostRuins commented Oct 5, 2023 • edited Loading

grawity commented Oct 5, 2023 • edited Loading

LostRuins commented Oct 6, 2023 • edited Loading

grawity commented Oct 4, 2023 •

edited

Loading

LostRuins commented Oct 5, 2023 •

edited

Loading

grawity commented Oct 5, 2023 •

edited

Loading

LostRuins commented Oct 6, 2023 •

edited

Loading