Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid leaving a zombie process for --onready #462

Merged
merged 1 commit into from
Oct 6, 2023

Conversation

grawity
Copy link

@grawity grawity commented Oct 4, 2023

subprocess.Popen() needs to be either used with with: or have .wait() called or be destroyed using del – otherwise the child process stays as a zombie process after exiting, until the object is GC'd.

subprocess.run() does the necessary .wait() automatically. (It's a convenience wrapper available from Python 3.6.)

Since shell=True is used and none of the subprocess-specific features are used, it might be even simpler to use os.system() (available everywhere).

Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
@LostRuins
Copy link
Owner

Actually, I don't think this is what you'd want. Subprocess.run requires that the child process has finished executing before returning control back to the script - that is - it is blocking. If the process you started is a persistent one (e.g. Cloudflare Tunnel) then control will never be returned to the main script.

Also, the retention of the process past the life of the server can be desired. If the user used --onready to launch a browser, for example, you don't want to kill the browser process just because the server exited. Rather if the target is well behaved, it should complete and exit on it's own, effectively the same as if you were running it as a command in the shell.

@LostRuins LostRuins added the invalid This doesn't seem right label Oct 5, 2023
@grawity
Copy link
Author

grawity commented Oct 5, 2023

Well, since this is specifically what I want – to run a short-lived command synchronously – then how about having two similar options, --onready and a new --onreadywait?

It's probably possible to solve this using signal to handle SIGCHLD but that seems risky in python.

The specific command I want to run is systemd-notify --ready, so I could also add direct Type=notify support using socket module instead, without any side effects.

@LostRuins
Copy link
Owner

Umm help me to understand though, why would you want the launched process to block the main script? Because if that happens, the server will not be able to handle any incoming requests until the script has finished executing, as that process is blocked. So from the perspective of the new subprocess it will be as if the loading has not yet finished.

If you want a script to run before loading is complete, you should execute it before calling the koboldcpp.py script. If you want the server to be functioning when you invoke your --onready process, then it needs to be running independently and not be blocked (i.e. the onready cannot block it).

In the case of systemd-notify --ready, that process will trigger simultaneously with the server start, run, cleanup itself and exit.

@grawity
Copy link
Author

grawity commented Oct 5, 2023

Umm help me to understand though, why would you want the launched process to block the main script

Well, no, I misspoke, it doesn't need to block necessarily. But 1) the process takes only like a millisecond to run and immediately exits, and 2) I do want it to be "cleaned up" (i.e. be able to exit without leaving a zombie process permanently) – and probably the easiest way to achieve the latter with Python subprocess is to just run it in a blocking manner. (At least as far I can see, anyway.)

Since the command takes just a few ms to run (and the HTTP socket is already listening at that point so the OS can begin queueing connections), that wouldn't really cause any problems for operation in my case.

But like I said, I can do an alternative using Python sockets instead (it's about 5 lines of code) – which honestly would be the preferred method anyway, I just wasn't sure how much Linux-specific code is okay to add.

If you want a script to run before loading is complete,

No, the purpose of the command is precisely to inform that loading is already complete (i.e. that the service is "ready").

@LostRuins
Copy link
Owner

LostRuins commented Oct 5, 2023

I don't really get the logic behind "cleaned up" though. When you start a new process, there are only 2 possibilities.

  1. The process runs, completes, and then exits
  2. The process runs and stays running background

If we start the subprocess and do not wait for it (non-blocking):

  • in case 1, server is started at the same time as the subprocess, the subprocess runs, completes and then exits
  • in case 2, server is started at the same time as the subprocess, the subprocess runs and stays running in the background

If we start the subprocess and wait for it (blocking):

  • in case 1, the subprocess runs, completes and then exits, then server is started
  • in case 2, the subprocess runs and stays running background, and the server never gets to start

So if your subprocess was going to exit normally, then starting it asynchronously is fine too since it won't persist.
And if your subprocess was going to persist, then starting it synchronously just leads to the server never getting to start.

Or to put it more simply:
does systemd-notify --ready run and stay running? Then subprocess.run() will stall the script.
does systemd-notify --ready run, complete and exit? Then subprocess.popen() will not leave a zombie process.

@grawity
Copy link
Author

grawity commented Oct 5, 2023

I don't really get the logic behind "cleaned up" though. When you start a new process, there are only 2 possibilities.

  1. The process runs, completes, and then exits
  2. The process runs and stays running background

[...]

So if your subprocess was going to exit normally, then starting it asynchronously is fine too since it won't persist.

I'm not sure about Windows, but there's a bit more to it on other systems (be it Linux or macOS). When a process "and then exits" it doesn't quite disappear just yet – most of its resources are indeed released, but the process entry remains and its parent must "collect" or "reap" it by calling waitpid() or similar to retrieve its exit code.

Until that's done, the process entry lingers as a "zombie" in the process list, e.g. in ps axf. (Yes, that's literally what the official term is.) It doesn't consume any resources anymore once it's in that state, so with just one process it's just that it irks me in principle to see it... but if you were to run more, they'd accumulate.

(Specifically, it irks me because it's not normal to see a zombie process stick around for long – it immediately makes me think "oh no, the parent is hung/frozen/stuck". Especially if they begin to accumulate underneath init, that means a reboot is imminent.)

So in Python, if a Popen() object is created and never has .wait() called on it, the child process will keep showing up in the process list even though it has already exited (at least until Python garbage-collects the Popen object).

Usually if a program wants to run some subprocess in a non-blocking manner, it needs to react to SIGCHLD signals that it receives every time a direct child exits and call wait() to collect it. Most event loops have handling for that, and you could do it with Python's signal module... but since this is Python joined with highly multithreaded C, I don't feel confident nearly enough to begin touching that area.

@LostRuins
Copy link
Owner

LostRuins commented Oct 6, 2023

Okay in that case I have a better solution. How about I move the onready code to a separate thread.

That way I can use subprocess.run or subprocess.popen + wait and it won't interrupt the server itself.

Edit: turns out i was already doing this in a new thread. so it wasnt an issue - i just added a wait()

in fact then I think using run directly would be fine too, as it wont block the main thread

@LostRuins LostRuins merged commit 9d0dd7a into LostRuins:concedo Oct 6, 2023
@LostRuins LostRuins removed the invalid This doesn't seem right label Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants