Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure USS on Linux #744

Closed
wants to merge 2 commits into from
Closed

Measure USS on Linux #744

wants to merge 2 commits into from

Conversation

EricRahm
Copy link
Contributor

This adds the measurement of USS (unique set size) to memory_info_ex on Linux. General logic was adapted from Firefox's memory measurement subsystem.

@giampaolo
Copy link
Owner

Mmmm I'm skeptical about this because it introduces a considerable slowdown. Also, at the current state this can already be determined by using memory_maps() method (although I guess not on OSX and possibly other platforms).
What is USS used for? Why does firefox use it? I wonder if we can extrapolate other useful stats from there.

@giampaolo
Copy link
Owner

...on the other hand it seems USS is very useful, as it reflects the real memory used by the process:
http://stackoverflow.com/questions/22372960/is-this-explanation-about-vss-rss-pss-uss-accurately
http://elinux.org/Android_Memory_Usage
Also PSS looks very useful.

@giampaolo
Copy link
Owner

Before the patch:

~/svn/psutil {master}$ python -m timeit -s "import psutil; p = psutil.Process()" "p.memory_info_ex()"
100000 loops, best of 3: 19.9 usec per loop

After the patch:

~/svn/psutil {master}$ python -m timeit -s "import psutil; p = psutil.Process()" "p.memory_info_ex()"
1000 loops, best of 3: 580 usec per loop

That's why I was worried about performances.

@EricRahm
Copy link
Contributor Author

It's possible the implementation could be made more efficient, I haven't put much effort into that. Given the usefulness of the measurement and the availability of memory_info which lacks it (and the associated overhead) perhaps it's okay to take the performance hit.

Manually iterating over the values from memory_maps is also an option, but is not cross-platform. As an end user I'd just like to get the USS, I don't care (nor do I want to learn) how it's calculated.

re: PSS, it could be measured though the same method as well and could certainly be added in a follow up.

@giampaolo
Copy link
Owner

AFAIK the only way Linux exposes this info is via /proc/{pid}/maps file. Perhaps instead of reading the file line by line we can try to read it all into memory and use a regular expression against the resulting string, do another benchmark and see what happens. Also, another speedup is to open the file in binary mode (open_binary() instead of open_text()).

@EricRahm
Copy link
Contributor Author

I tried out regex, reading the whole file, and using open_binary, they have no impact.

@giampaolo
Copy link
Owner

Can you post the regex code?

@EricRahm
Copy link
Contributor Author

            p = re.compile("Private.*:\s+(\d+)")
            with open_text("%s/%s/smaps" % (self._procfs_path, self.pid),
                           buffering=BIGGER_FILE_BUFFERING) as f:
                private = 0
                for x in f:
                    m = p.match(x)
                    if m:
                        private += int(m.group(1))
                return private * 1024

@giampaolo
Copy link
Owner

No I was suggesting to try one (single) regex against the whole data (as in data = f.read()).

@EricRahm
Copy link
Contributor Author

That's a bit better:

python -m timeit -s "import psutil; p = psutil.Process()" "p.memory_info_ex()"
1000 loops, best of 3: 272 usec per loop

@giampaolo
Copy link
Owner

Code?

@EricRahm
Copy link
Contributor Author

            p = re.compile("Private.*:\s+(\d+)")
            with open_text("%s/%s/smaps" % (self._procfs_path, self.pid),
                           buffering=BIGGER_FILE_BUFFERING) as f:
                return sum(map(int,p.findall(f.read()))) * 1024

@giampaolo
Copy link
Owner

I found this:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-reviews/_DMLdt3jcTA
...which leads to this patch:
https://chromiumcodereview.appspot.com/9568046/diff/2004/base/process_util_linux.cc
...which calculates uss as rss - shared but the result I get doesn't match with the one we get by summing all "Private" stats... it's slightly lower.

@EricRahm
Copy link
Contributor Author

See this note from the Firefox source.

@EricRahm
Copy link
Contributor Author

EricRahm commented Feb 2, 2016

@giampaolo At this point do you want to take this (and the other platforms)? I can do a Mozilla specific fork if you don't, but I'd prefer not to.

Another option is to split the USS measurement into it's own function, although it seems to make the most sense in memory_info_ex.

@giampaolo
Copy link
Owner

@EricRahm I definitively want this, I'm just unsure about how to provide this in terms the API, since it introduces such a huge slowdown. I was thinking that maybe we could control this via a specific parameter (something like deep_inspect=False), but I need some time to think about it (and I'm currently travelling). In the meantime can you provide implementation also for BSD* and Solaris? I'll think later how to better expose this in terms of API.

Also, in the long term, I would also like to provide PSS.

@giampaolo
Copy link
Owner

In the end I think it makes sense to integrate this into memory_info_ex. Even if this introduces a big slowdown on all platforms what it provides is extremely useful.
Also, most of the times users will use memory_info instead, which is (a lot) faster, and rely on memory_info_ex only when they want to do a serious memory profiling.

As for this PR: it can be closed as I've just committed 7f0e093 which also provides PSS. @EricRahm I reviewed your other 2 PRs for OSX and Windows which I would like to merge soon.

@giampaolo
Copy link
Owner

After a lot of thinking I came up to the conclusion that it is not convenient to calculate these metrics into memory_info_ex. The main reason (other than for the big slowdown they introduce for a function which was previously very fast) is that inspecting the process address space implies having higher privileges. As such the "normal" metrics may succeed but USS may not in which case we should set it to 0 (because we don't want to lose the other info). But by setting something to zero you have no way to figure whether it's 0 because it's the real value or because you had no enough privileges. In summary, to me this screams like something which requires a brand new, separate function. So I did in #757. Now we have a new Process.memory_full_info() returning all memory_info_ex stats plus USS, PSS and SWAP on Linux and USS only on OSX and Windows which raises AccessDenied as expected. At the same time I deprecated memory_info_ex in favour of memory_info which now returns all "fast" memory stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants