Decouple /stat reading from the Process constructor #60

edigaryev · 2019-12-20T01:42:53Z

Reading procfs is a rather slow operation and sometimes when reading it in bulk you only need a specific file like /proc/<pid>/maps, which is currently constructed through Process.maps().

Is there a reason why the Process structure constructor reads and parses /proc/<pid>/stat by default? Also, recently it got coupled even more, see bfd2c86.

It would be nice to have bare Process structure and then query the related resources on an on-demand basis.

The text was updated successfully, but these errors were encountered:

eminence · 2019-12-21T05:59:00Z

This is a good question. No, there is no good reason why the constructor always parses the stat info, except that I generally always needed that info in one of the projects that prompted me to write this crate in the first place.

I would definitely consider decoupling this. Do you think there is a cheap way to figure out the process ID? I would be a little sad if bfd2c86 had to be reverted. Normally the constructor is passed in a pid, but with the new Process::new_with_root, we might not know the pid. I guess we could try to parse the latest path component of the PathBuf, but that seems unreliable.

edigaryev · 2019-12-21T10:26:23Z

Is there a reason why the pid field is cached in the Process structure at all? A new call to stat() won't return cached data, yet a call to pid() will?

Note that I use the term "caching" for pid field. Currently it represents a value semantically similar to gettid(2)'s return value, not getpid(2)'s one. Unfortunately, it's possible for gettid(2)'s value to change during an execve(2) in a multi-threaded process, which will make this field's value stale:

https://github.com/torvalds/linux/blob/f1fd1610cbb6655883d1838ac79e53301596685d/fs/exec.c#L1145-L1150

Do you think there is a cheap way to figure out the process ID?

Process::myself() can be implemented using std::process::id() instead of reading /proc/self. Also note that /proc/self is not the the same thing as /proc/thread-self, which may be important in some multi-threaded scenarios.

As for Process::new_with_root(), it can be reincarnated as a constructor taking procfs root (e.g. /proc) instead of process/thread root (e.g. /proc/1). The new constructor can technically have the same name as it's signature will change to include the pid argument.

One can even go further and future-proof things a bit by naming the constructor something like Process::new_with_config() and let it take a structure with fields wrapped in Option's (analogous to default arguments in other languages).

eminence · 2019-12-21T18:30:20Z

Can you explain this part a little more:

The pid field currently represents a value semantically similar to gettid(2)'s return value, not getpid(2)'s one

Is this because in Process::myself(), the root is saved as /proc/self/, instead of dereferencing the symlink? I hadn't carefully thought about this, in the face of execv. I'm not sure the current implementation matches the behavior I have in my head. Would dereferencing /proc/self in the myself constructor bring us closer to getpid() semantics (i.e. the pid field would never go stale`)?

But with Process::new(pid) It's my understanding (and expectation) that the pid field will always match stat().unwrap().pid

edigaryev · 2019-12-23T20:20:49Z

Sorry, I've just realized that this caching problem I've introduced above seems to manifest itself only when the Process is created with non-main thread credentials (using gettid(2) value of a thread for which getpid(2) is different).

Process::new() and Process::new_with_root() are affected, but Process::myself() is not, because /proc/self uses getpid(2) semantics (compared to /proc/self-thread, which uses gettid(2) semantics).

eminence · 2020-03-27T02:59:51Z

hi @edigaryev I'm coming back to this issue after several months. I confess I am not sure where we stand on this topic, since I am not sure I really understood the issue you were trying to describe.

What's your recommendation?

I am still thinking about your suggestion to remove the pid and stat fields from Process. This would be a breaking change, but it might be the right thing to do.

edigaryev · 2020-03-27T09:40:09Z

Hi 👋!

I'm mostly unaffected by this now, as I currently need some fields from /stat for each process too when scraping procfs.

I suggest that we close this for now, until the issue resurfaces again for someone else.

eminence · 2020-03-27T15:38:32Z

Ok, sounds good. Thanks for letting me know. I've created a new tracking issue for this as to not forget about it. Thanks!

eminence mentioned this issue Mar 27, 2020

Future breaking API changes #69

Closed

eminence closed this as completed Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple /stat reading from the Process constructor #60

Decouple /stat reading from the Process constructor #60

edigaryev commented Dec 20, 2019 •

edited

Loading

eminence commented Dec 21, 2019

edigaryev commented Dec 21, 2019

eminence commented Dec 21, 2019

edigaryev commented Dec 23, 2019

eminence commented Mar 27, 2020

edigaryev commented Mar 27, 2020

eminence commented Mar 27, 2020

Decouple /stat reading from the Process constructor #60

Decouple /stat reading from the Process constructor #60

Comments

edigaryev commented Dec 20, 2019 • edited Loading

eminence commented Dec 21, 2019

edigaryev commented Dec 21, 2019

eminence commented Dec 21, 2019

edigaryev commented Dec 23, 2019

eminence commented Mar 27, 2020

edigaryev commented Mar 27, 2020

eminence commented Mar 27, 2020

edigaryev commented Dec 20, 2019 •

edited

Loading