You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For every process yielded by psutil.process_iter(), internally we check whether the process PID has been reused, in which case we return a "fresh" Process instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve process create_time() and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant because process_iter() is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).
By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:
importtime, psutilstarted=time.monotonic()
forxinrange(1000):
list(psutil.process_iter())
print(f"completed in {(time.monotonic() -started):.4f} secs")
Current master: Number of pids: 481. Completed in 5.1079 secs
With PID reuse check removed: Number of pids: 481. Completed in 0.2419 secs
Repercussions
PID reuse is already pre-emptively checked for "write" Process APIs such as kill(), terminate(), nice() (set), etc., so in that sense it won't make any difference and we'll remain safe.
There are some Process APIs that are cached: exe(), create_time() and name() (Windows only). In this case, if PID has been reused, the Process instance will keep returning the old value, which doesn't happen with the current (slow) implementation, since process_iter() returns a brand new Process instance.
We may clear Process cache on is_running(), but we cannot clear create_time()'s cache, as the old value is necessary to detect PID reusage. This basically means a PID-reused Process instance should just be discarded by process_iter() somehow (but how?).
The text was updated successfully, but these errors were encountered:
Summary
Description
For every process yielded by
psutil.process_iter()
, internally we check whether the process PID has been reused, in which case we return a "fresh"Process
instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve processcreate_time()
and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant becauseprocess_iter()
is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:
Current master:
Number of pids: 481. Completed in 5.1079 secs
With PID reuse check removed:
Number of pids: 481. Completed in 0.2419 secs
Repercussions
kill()
,terminate()
,nice()
(set), etc., so in that sense it won't make any difference and we'll remain safe.Process
APIs that are cached:exe()
,create_time()
andname()
(Windows only). In this case, if PID has been reused, theProcess
instance will keep returning the old value, which doesn't happen with the current (slow) implementation, sinceprocess_iter()
returns a brand newProcess
instance.Process
cache onis_running()
, but we cannot clearcreate_time()
's cache, as the old value is necessary to detect PID reusage. This basically means a PID-reusedProcess
instance should just be discarded byprocess_iter()
somehow (but how?).The text was updated successfully, but these errors were encountered: