-
-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework CPU counting #656
Rework CPU counting #656
Conversation
a3b2ac6
to
5408fd7
Compare
37fd370
to
5ff6d27
Compare
@@ -81,12 +81,15 @@ typedef struct ProcessList_ { | |||
memory_t usedSwap; | |||
memory_t cachedSwap; | |||
|
|||
unsigned int cpuCount; | |||
unsigned int activeCPUs; | |||
unsigned int existingCPUs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal but I find the naming slightly unclear - AIUI 'active' means 'online' and 'existing' means the CPU exists but is either online or offline? An alternative might be to use 'onlineCPUs' and something like 'actualCPUs' or 'realCPUs' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer actualCPUs
over realCPUs
to not clash with virtualization stuff …
@@ -158,30 +158,85 @@ static void LinuxProcessList_initNetlinkSocket(LinuxProcessList* this) { | |||
|
|||
#endif | |||
|
|||
static void LinuxProcessList_updateCPUcount(ProcessList* super, FILE* stream) { | |||
static void LinuxProcessList_updateCPUcount(ProcessList* super) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This sysfs-walk looks like a relatively expensive approach - going from what is currently a handful of syscalls at most, per sample, to potentially many thousands of additional syscalls with each sample on large CPU count machines.
Does an offline CPU appear at all in /proc/stat output? If not, we should be able to do this operation using a combination of a/ one-time call to sysconf(_SC_NPROCESSORS_CONF) at startup and, b/ CPU state-management based on presence (or zero values?) in /proc/stat (while reading stats) at sample time... i.e. if cpu22 of 32 is missing in /proc/stat mark it as !online ... and update activeCPU count accordingly - would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reporter of the issue that probably got this started here. I have no other authority on the matter. :)
Another alternative in case of Linux would be to parse the list of online CPU nodes at /sys/devices/system/cpu/online
on update. That would be single read from sysfs rather than one read per CPU.
The sysfs files are listed in the Linux sysfs documentation. The format for the CPU-listing files seems to be e.g. "0-3,5,7", which in case of the aggregate online
file would mean that CPU nodes 0 to 3, and additionally 5 and 7 are online.
That would give explicit and authoritative information on which CPUs are currently being scheduled. It might of course be simpler to just assume that the ones not appearing in /proc/stat
are offline. That seems to be the case on my machine but I have no broader knowledge on how the file lists the CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original proposal used sysconf(_SC_NPROCESSORS_CONF)
, which is implemented as a sysfs-walk, see https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/getsysstats.c.html#236.
By doing the walk on our own we can search specific for the data we need.
Maybe at RC stage we can take a more serious look on performance.
} else { | ||
unsigned int cpuid; | ||
(void) sscanf(buffer, "cpu%4u %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu", &cpuid, &usertime, &nicetime, &systemtime, &idletime, &ioWait, &irq, &softIrq, &steal, &guest, &guestnice); | ||
assert(cpuid == i - 1); | ||
adjCpuId = cpuid + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is what I was looking for in that previous comment. :) So an alternate approach to finding offline CPUs might be to mark all offline at start of this function, and then mark each individually as 'online' as we parse the file and find a CPU (and update counts as we go if we need total online vs actual).
Currently htop does not support offline CPUs and hot-swapping, e.g. via echo 0 > /sys/devices/system/cpu/cpu2/online Split the current single cpuCount variable into activeCPUs and existingCPUs. Supersedes: htop-dev#650 Related: htop-dev#580
Bogus uptime measurements can result in wrap-arounds, leading to negative garbage values printed.
Kernel threads do not have an executable and the check can result in garbage values as unprivileged user.
Wait until it has been decided what kind of task the entry actually is.
sched_getaffinity() and sched_setaffinity() are also available on BSDs. Remove the Linux restraint.
Wait until it has been decided what kind of task the entry actually is.
Example hot-swapping: psradm -F -f 2
openbsd/OpenBSDProcessList.c:176:56: error: no member named 'ki_pid' in 'struct kinfo_proc'; did you mean 'p_pid'? const int mib[] = { CTL_KERN, KERN_PROC_CWD, kproc->ki_pid }; ^~~~~~ p_pid /usr/include/sys/sysctl.h:375:10: note: 'p_pid' declared here int32_t p_pid; /* PID_T: Process identifier. */ ^ openbsd/OpenBSDProcessList.c:458:33: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] if (opl->cpus[i].cpuIndex == id) ~~~~~~~~~~~~~~~~~~~~~ ^ ~~
Works on MacOSX 11.3.1 (Intel CPU) |
Currently htop does not support offline CPUs and hot-swapping, e.g. via
echo 0 > /sys/devices/system/cpu/cpu2/online
Split the current single
cpuCount
variable intoactiveCPUs
andexistingCPUs
.Supersedes: #650
Related: #580
/cc @mwahlroos @sthen
@natoscott please take a look at the PCP code
TODO: