Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup Process methods #799

Closed
giampaolo opened this issue Mar 29, 2016 · 20 comments
Closed

Speedup Process methods #799

giampaolo opened this issue Mar 29, 2016 · 20 comments

Comments

@giampaolo
Copy link
Owner

giampaolo commented Mar 29, 2016

This is something I've been thinking about for a while. Problem with current Process class implementation is that if you want to fetch multiple process info the underlying (C / Python) implementation may unnecessarily do the same thing more than once.

For instance, on Linux we read /proc/pid/stat file to get terminal, cpu_times, and create_time, and each time we invoke those methods we open the file and read from it. We get the one info we're interested in and discard the rest.
A similar thing happens on basically every OS. For instance on BSD we use kinfo_proc syscall to get basically 80% of all process info (uids, gids, create_time, ppid, io_counters, status etc.).
Again, all this info retrieved once (in C) and re-requested every time we call a Process method.

Since we typically get more than one info about the process (e.g. think about a top-like app) it appears clear that this could (and should) be done in a single operation. A possible solution would be to provide a context manager which temporarily puts the Process instance in a state so that internally the requested metrics are determined in a single shot and then "cached" / "stored" somewhere:

p = psutil.Process()
with p.oneshot():
    p.terminal()  # internally, this retrieves terminal, cpu_times and create time
    p.cpu_times()  # return the cached value
    p.create_time()  # return the cached value

Note: Process.as_dict() method would use this method implicitly.

=== EDITS AFTER COMMENTS BELOW ===

Branch

master...oneshot#files_bucket

Benchmark scripts

Linux (+2.56x speedup)

$ python scripts/internal/bench_oneshot.py 
11 methods involved on platform 'linux2' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    name
    num_ctx_switches
    num_threads
    ppid
    status
    terminal
    uids
normal:  0.233 secs
oneshot: 0.091 secs
speedup: +2.56x

Windows (+1.9x or +6.5x speedup)

user's process:

C:\Python27\python.exe scripts\internal\bench_oneshot.py
13 methods involved on platform 'win32' (1000 iterations, psutil 4.5.0)
    cpu_affinity
    cpu_percent
    cpu_times
    io_counters
    ionice
    memory_info
    memory_percentnice
    num_ctx_switches
    num_handles
    num_threads
    parent
    ppid
normal:  1.243 secs
onshot:  0.655 secs
speedup: +1.90x

other user's process:

C:\Python27\python.exe scripts\internal\bench_oneshot.py
11 methods involved on platform 'win32' (1000 iterations, psutil 4.4.2):
    cpu_percent
    cpu_times
    create_time
    io_counters
    memory_info
    memory_percent
    num_ctx_switches
    num_handles
    num_threads
    parent
    ppid
normal:  5.027 secs
onshot:  0.765 secs
speedup: +6.57x

FreeBSD (+2.18x speedup)

$ python scripts/internal/bench_oneshot.py 
13 methods involved on platform 'freebsd10' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    io_counters
    memory_full_info
    memory_info
    memory_percent
    num_ctx_switches
    ppid
    status
    terminal
    uids
normal:  0.121 secs
oneshot: 0.056 secs
speedup: +2.18x

OSX (+1.92x speedup)

$ python scripts/internal/bench_oneshot.py
13 methods involved on platform 'darwin' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    memory_info
    memory_percent
    name
    num_ctx_switches
    num_threads
    parent
    ppid
    terminal
    uids
    username
normal:  0.200 secs
onshot:  0.104 secs
speedup: +1.92x

SunOS (+1.37x speedup)

$ python scripts/internal/bench_oneshot.py
12 methods involved on platform 'sunos5' (1000 iterations):
    cmdline
    create_time
    gids
    memory_full_info
    memory_info
    memory_percent
    name
    num_threads
    ppid
    status
    terminal
    uids
normal:  0.087 secs
oneshot: 0.064 secs
speedup: +1.37x
@nicolargo
Copy link
Contributor

+1 for this enhancement request. It will be awesome for the Glances project.

@giampaolo
Copy link
Owner Author

giampaolo commented Apr 30, 2016

I started working on this in a separate branch (master...oneshot#files_bucket
) and completed the Linux implementation. The code below runs about twice as fast:

import psutil
import time

attrs = ['ppid', 'uids', 'gids', 'num_ctx_switches', 'num_threads', 'status',
         'name', 'cpu_times', 'terminal']
p = psutil.Process()
t = time.time()
for x in range(1000):
    p.as_dict(attrs)
print(time.time() - t)

@nicolargo
Copy link
Contributor

Any head up on this enhancement ?

@giampaolo
Copy link
Owner Author

I completed the linux implementation but I still have to benchmark it
properly. All other platform implementations are still missing. It's gonna
take a while.
On Jul 10, 2016 1:20 PM, "Nicolas Hennion" notifications@github.com wrote:

Any head up on this enhancement ?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#799 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAplLE5TZUT2J0ENMsnRbgt6kc7j46wiks5qUNWPgaJpZM4H6kAZ
.

@giampaolo
Copy link
Owner Author

Linux benchmark. With this I get a 2x speedup (twice as fast) if I involve all the "one shot" methods, meaning I am emulating the best possible scenario:

import psutil
import time


def doit(p):
    p.name()
    p.terminal()
    p.cpu_times()
    p.create_time()
    p.status()
    p.ppid()
    #
    p.num_ctx_switches()
    p.num_threads()
    p.uids()
    p.gids()


p = psutil.Process()

t = time.time()
for x in range(1000):
    doit(p)
print("normal:  %f" % (time.time() - t))

t = time.time()
for x in range(1000):
    with p.oneshot():
        doit(p)
print("oneshot: %f" % (time.time() - t))

Output:

normal:  0.189042
oneshot: 0.097632

@giampaolo
Copy link
Owner Author

giampaolo commented Aug 2, 2016

FreeBSD impact deriving from getting multiple (14) info when only 1 is needed is negligible 0.46 secs vs. 0.42, so even when NOT using oneshot() and retrieving a single process info does not slow things down.

@giampaolo
Copy link
Owner Author

Linux speedup went from 1.9x to 2.6x after f851be9.

@giampaolo
Copy link
Owner Author

giampaolo commented Aug 3, 2016

BSD platforms implementation is completed. On FreeBSD I get a +2.18x speedup.
I also added a benchmark script here: https://github.com/giampaolo/psutil/blob/oneshot/scripts/internal/bench_oneshot.py.

@nicolargo
Copy link
Contributor

nicolargo commented Aug 4, 2016 via email

@giampaolo
Copy link
Owner Author

giampaolo commented Aug 4, 2016

Yes, this is intended for all OSes, even though Windows is probably gonna be the most difficult platform because it has less C APIs which can be used to directly retrieve multiple info in one shot.
For instance, BSD is the exact opposite, as in one shot you get a whole blob of stuff:

#ifdef __FreeBSD__

The only Windows C call I can think of that is being used basically all the time on Windows is OpenProcess.
We use a wrapper around it:

...which is extensively used in the main C extension module:

~/svn/psutil {master}$ grep psutil_handle psutil/_psutil_windows.c | wc -l
16

What we can do is get the handle once, store it in Python (as an int), then pass it back to the C extension as an argument, and do this as long as we're in the oneshot context (then on __exit__ we're gonna "CloseHandle() it"). The methods involved should be (at least): cpu_times(), create_time(), memory_info(), nice(), io_counters(), cpu_affinity(), num_handles() and memory_maps(). So yes, also on Windows there's a lot of space for speeding things up quite a bit.

@giampaolo
Copy link
Owner Author

Solaris implementation 630b40d +1.37x speedup.

giampaolo added a commit that referenced this issue Aug 6, 2016
…Handle in order to keep the handle reference at Python level and allow caching
@giampaolo
Copy link
Owner Author

It turns out storing OpenProcess handle in Python is slower than retrieving it in C every time. I experimented with this in here:
oneshot...oneshot-win#files_bucket
...and I get a -1.5x slowdown. As such Windows is apparently the only platform which cannot take advantage of this.

@giampaolo
Copy link
Owner Author

giampaolo commented Oct 7, 2016

OSX implemented as of 7b2a6b3 and cf21849. The speedup is 1.8x! Unless I'm missing something else we should be done with all platforms.

@nicolargo
Copy link
Contributor

Good news @giampaolo !

@giampaolo
Copy link
Owner Author

OSX: going from 1.8 to 1.9 speedup with 1e8cef9.

giampaolo added a commit that referenced this issue Oct 28, 2016
@giampaolo
Copy link
Owner Author

giampaolo commented Oct 28, 2016

It turns out the apparent slowdown occurring on Windows as per my previous message #799 (comment) was due to the benchmark script not being stable enough, so we're good also on Windows.
https://github.com/giampaolo/psutil/blob/7f51f0074b6d727a01fea0290ed0988dd51ad288/scripts/internal/bench_oneshot_2.py script relying on perf module shows a +1.2x speedup.
With c10a7aa and 3efb6bf I went from +1.2x to +1.8x.

@giampaolo
Copy link
Owner Author

The interesting thing about Windows is that because some Process methods use a dual implementation (see #304) we can get a way bigger speedup for PIDs owned by other users, for which the first "fast" implementation raises AccessDenied.
On a high-privileged PID by using oneshot() I am now getting an awesome +6.3x speedup!

@giampaolo
Copy link
Owner Author

giampaolo commented Nov 5, 2016

OK, this is now merged into master as of de41bcc.

@nicolargo
Copy link
Contributor

Great job @giampaolo !

Many thanks.

@suzaku
Copy link

suzaku commented Nov 7, 2016

Great job!

nlevitt added a commit to nlevitt/psutil that referenced this issue Apr 9, 2019
* master: (375 commits)
  update psutil
  fix procsmem script which was not printing processes
  try to fix tests on travis
  try to fix tests on travis
  OSX: fix compilation warning
  giampaolo#936: give credits to Max Bélanger
  giampaolo#811: move DLL check logic in _pswindows.py
  winmake: use the right win slashes
  winmake: do not try to install GIT commit hook if this is not a GIT cloned dir
  giampaolo#811: on Win XP let the possibility to install psutil from sources as it still (kind of) works)
  giampaolo#811: add a Q&A section in the doc; tell what Win versions are supported
  giampaolo#811: raise a meaningful error message if on Windows XP
  update doc; bump up version
  giampaolo#939: update MANIFEST to include only src files and not much else
  update HISTORY
  travis: execute mem leaks and flake8 tests only on py 2.7 and 3.5; no need to test all python versions
  bump up version
  update version in doc
  add simple test case for oneshot() ctx manager
  add simple test case for oneshot() ctx manager
  speedup fetch all process test by using oneshot
  giampaolo#799 - oneshot / linux: speedup memory_full_info and memory_maps
  fix flake8
  first pass
  giampaolo#943: better error message in case of version conflict on import.
  update doc
  799 onshot / win: no longer store the handle in python; I am now sure this is slower than using OpenProcess/CloseHandle in C
  update doc
  (win) add memleak test for proc_info()
  move stuff around
  memleak: fix false positive on windows
  giampaolo#933 (win) fix memory leak in WindowsService.description()
  giampaolo#933 (win) fix memory leak in cpu_stats() (missing free())
  refactoring
  giampaolo#799 / win: pass handle also to memory_maps() and username() functions
  fix numbers
  mem leak script: provide better error output in case of failure
  refactor memleak script
  refactor memleak script
  refactor memleak script
  refactor memleak script
  refactor memleak script: get rid of no longer used logic to deal with Process properties
  memleak script refactoring
  doc styling
  giampaolo#799 / win: use oneshot() around num_threads() and num_ctx_switches(); speedup from 1.2x to 1.8x
  refactor windows tests
  win: enable dueal process impl tests
  win / C: refactor memory_info_2 code() and return it along side other proc_info() metrics
  windows c refactor proc_info() code
  update windmake script
  winmake clean: make it an order of magnitude faster; also update Makefile
  update doc
  bench script: add psutil ver
  winmake: more aggressive logic to uninstall psutil
  adjust bench2 script to new perf API
  try to adjust perf
  upgrade perf code
  memory leak script: humanize memory difference in case of failure
  style changes
  fix giampaolo#932 / netbsd: check connections return value and raise exception
  netbsd / connections: refactoring
  netbsd / connections: refactoring
  netbsd / connections: refactoring
  netbsd / connections: refactoring
  netbsd / connections: refactoring
  testing make clean with unittests was a bad idea after all
  make 'make clean' 4x faster!
  add test for make clean
  adjust winmake script
  fix netbsd/openvsd compilation failure
  bsd: fix mem leak
  osx: fix memory leak
  pre-release
  refactoring
  update IDEAS
  add mtu test for osx and bsd
  osx: separate IFFLAGS function
  osx/bsd: separate IFFLAGS function
  linux: separate IFFLAGS function
  share C function to retrieve MTU across all UNIXes
  HISTORY: make anchors more easily referenceable
  fix giampaolo#927: Popen.__del__ may cause maximum recursion depth error.
  fix Popen test which is occasionally failing
  more releases timeline from README to doc
  ignore failing tests on OSX + TRAVIS
  update INSTALL instructions
  update print_announce.py script
  update HISTORY
  HISTORY: provide links to issues on the bug tracker
  update IDEAS
  giampaolo#910: [OSX / BSD] in case of error, psutil.pids() raised RuntimeError instead of the original OSError exception.
  fix unicode tests on windows / py3
  small test refactoring
  fix giampaolo#926: [OSX] Process.environ() on Python 3 can crash interpreter if process environ has an invalid unicode string.
  osx: fix compiler warnings
  refactor unicode tests
  fix unicode test
  giampaolo#783: fix some unicode related test failures on osx
  test refactoring
  test refactroring
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants