-
-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is htop safe from PID reuse? #1441
Comments
That's what we got the "Support Request" tag for … ;-)
That question is not easy to answer in general, but as there currently are no mechanisms in place to track short-lived processes nor process terminations, there is a valid chance to accidentally send a signal to a new process. The specifics would need some in-depth investigation, but the pre-conditions are basically:
As those two conditions are not "lockable" from userspace ALL process monitors are likely to have this issue in one form or another. With some OS (for example on Windows) there are workarounds that could mitigate this (by keeping a process handle, thus forcing the PID to remain "dormant". On Linux and *BSD this is AFAIK not an option.
To reduce the resource usage most columns that rarely change are somewhat cached or refreshed only at a slower rate. The two most prominent columns this applies to are the process' command line and the shared memory usage. For the command line (plus: thread names, executable name, and current working directory) there's a setting to force refresh in every update cycle. For the shared memory usage things are updated roughly every 2-3 update cycles to reduce load. This was done as processes rarely change their set of loaded libraries drastically over a short time.
A chance for this to happen exists IFF the refresh setting mentioned above is turned off (default AFAIR), the old process exits, AND a new process re-using that PID starts ALL between two refresh cycles of htop (by default 1.5 seconds, minimum 0.1 seconds, maximum infinite). The chance for this to happen is negligible on mostly idle systems and fairly small on busy ones unless there's really high load with many processes starting/stopping each second.
You're welcome. |
To be entirely honest and for full disclosure: I'm the author of a htop-like Python library called psutil, so this is why I showed up here. I was hoping htop solved this issue which I don't know how to solve (see giampaolo/psutil#2396). I think I can return back the favor though.
I know. :)
Actually there is a solution to this.
I believe you can implement either solution 1 or 2 in htop as well. Solution 1 works on all platforms including Windows, and it has been battle tested in psutil for years, so I would recommend this one. |
| Actually there is a solution to this. Hmm, I'm not convinced. In option #1 there remains a race condition between the second PID creation time check and the time when you send the signal. I guess you could check creation time again after sending the signal, and then say "oops, sorry, I may have done the wrong thing, not sure" to the user if the PID changed in-between ... but the problem isn't solved AFAICT. And since its common to be signaling with SIGTERM / SIGKILL, any subsequent check is going to be very unreliable anyway. Option #2 sounds more feasible but I still wonder if this is primarily a theoretical issue? The kernels PID selection strategies make rapid reuse unlikely, so I think this may be a "solution looking for a problem" in system tools like ours - has anyone ever reported this issue occurring? I can definitely see a rationale for that syscall in other situations, but not so much for system tools that are sampling PIDs frequently. |
Theoretically you are correct, there is a (very small) time window during which the PID could be reused, see giampaolo/psutil#2400. I would speculate that the kernel is smart enough not to reuse the same PID that quickly though.
There is a downside to using this option that I didn't mention because I realized it just now. To use solution 2 you have to pre-emptively save
|
| I would speculate that the kernel is smart enough not to reuse the same PID that quickly though. 100% agreed. And given we're sample every 1-2 seconds by default, this whole issue is likely a non-problem in practice. | There is a downside to using this option +1 It could be solved though. We typically only signal one PID at a time (requires UI selection/interaction), so if we went this path (I'm definitely not advocating for it!) it could be done in a way that only selected processes have open FDs associated with them. |
AFAIK, there is no general solution. In Unix-like systems the PIDs are only reserved for the parents when a process died and that's what "zombie processes" refer to. Once the PIDs are freed (i.e. zombies reaped by the parent process) another process can be allocated the same PID, even though the OS would try to avoid that whenever possible. Since PID is a limited space, your only chance of minimizing the PID collision is raising the pid_max limit. And by the way. If the OS would reserve PIDs for the process managers like htop, we could end up a lot of PIDs reserved and become "zombies" when a process manager is not responsive. |
Like mentioned in the above comments. There is TOCTOU. Unless your OS has a
This " |
I think we're all in agreement there's nothing we should change in htop here (if I got that wrong, please reopen & lets discuss further) |
While this support question can be closed. I have a feature proposal in case you guys are interested: #1442 |
FWIW, I think htop should use creation time as I previously described in #1441 (comment) (solution 1). This is racy like any other user-space solution based on PID alone, but it gives a high level of reliability because the race condition is extremely unlikely to occur in practice. Having zero checks in place that try to prevent killing a reused PID may lead to data loss, DoS or have security implications. |
@giampaolo No. That's a false sense of security as you didn't eliminate the race totally. You should raise the pid_max limit if you are truly afraid of this. Another mitigation for the issue is to limit the user's privilege when killing a process, so that the user won't accidentally kill a process owned by someone else. By the way, #1442 would also be a partial solution. There is still a race between a process entry being last updated and the process file descriptor being opened. But the usability could be a little better as the user can review again what processes they are killing. |
Discussion is getting split. :) The problem with #1442 is that it checks for process identity very late in the lifetime of the process. If htop is being open for 10 minutes and PID reuse happened 5 minutes ago you will not know. Detecting PID reusage is closely related to identifying a process uniquely over time. You can't use just the PID, so you have to add something else to the mix. That can be PID creation time or The downside of |
htop periodically updates the process list. Unless you pause the update, you would notice it already.
Keep in mind that "process ID + creation time" combination does not make the process unique. There is precision issue in time measurements, and the process can spawn and die very quickly between time measurements, so the identifier like this won't be as unique as you think. (Even v1 UUID format needs to avoid the issue where two ID generation requests happen very quickly.) There is no "truly unique" identifier for processes as far as I can think of. This is a non-issue. It's more of a limitation due to OS design, and a process manager like htop can't help anything with it. |
No you won't. htop simply sees that PID X existed before and after the update. It doesn't check whether that PID belongs to a different process now, so from htop perspective nothing changed. It will even show the old process CMDLINE, since it's cached (which is fine, I'm merely talking about making
Agreed. It's a compromise. Linux provides a 2 digits creation time precision (e.g. 432904.78). That means that the creation time strategy guarantees that you won't kill the wrong process unless the OS reused the same PID in the last 3 digits seconds (e.g. 432904.785), which is way better than having no check at all IMO. |
There is some limited way with kprobes/event triggers. We will eventually need to implement these for tracking short-lived processes, but for the feature suggested here they are pure overkill …
As said, you can do this with kernel tracing, but this is overkill if it were just for this one (rare) situation …
See above … |
Would you report this as a bug? I think at least there should be a sanity check to ensure the process CMDLINE or username or whatever is same as before. The process CMDLINE can change after an (Perhaps we should detect also changes of PGRP (process group ID) and sessions. These are unlikely to collide even when a PID is reused between a short time.) I mentioned user privilege being one possible way to mitigate the issue. |
Hi. This is a question, so sorry in advance if this is not appropriate for the bug tracker.
Is htop safe from PID reuse? E.g. if a PID is reused and I SIGTERM it via htop, is there a risk that I terminate the wrong (new) process?
Furthermore, I'm not sure if some htop columns are cached (COMMAND column?). If they are: is there a risk that htop keeps showing the wrong column for the reused PID / process?
Thanks
The text was updated successfully, but these errors were encountered: