Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework CPU counting #656

Merged
merged 12 commits into from
Aug 2, 2021
Merged

Rework CPU counting #656

merged 12 commits into from
Aug 2, 2021

Conversation

cgzones
Copy link
Member

@cgzones cgzones commented Jun 12, 2021

Currently htop does not support offline CPUs and hot-swapping, e.g. via
echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and existingCPUs.

Supersedes: #650
Related: #580

/cc @mwahlroos @sthen

@natoscott please take a look at the PCP code

TODO:

  • Check on Darwin
  • Check on DragonFlyBSD
  • Check on FreeBSD
  • Check on NetBSD (Add NetBSD support in htop(1) #603)
  • Check on OpenBSD
  • Check on Solaris
  • Check with PCP
  • Hide offline CPUs in affinity panel

@cgzones cgzones force-pushed the cpu_count branch 4 times, most recently from a3b2ac6 to 5408fd7 Compare June 12, 2021 21:07
@BenBE BenBE added the enhancement Extension or improvement to existing feature label Jun 13, 2021
@cgzones cgzones force-pushed the cpu_count branch 12 times, most recently from 37fd370 to 5ff6d27 Compare June 13, 2021 12:41
@cgzones cgzones marked this pull request as ready for review June 13, 2021 12:42
@@ -81,12 +81,15 @@ typedef struct ProcessList_ {
memory_t usedSwap;
memory_t cachedSwap;

unsigned int cpuCount;
unsigned int activeCPUs;
unsigned int existingCPUs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal but I find the naming slightly unclear - AIUI 'active' means 'online' and 'existing' means the CPU exists but is either online or offline? An alternative might be to use 'onlineCPUs' and something like 'actualCPUs' or 'realCPUs' here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer actualCPUs over realCPUs to not clash with virtualization stuff …

@@ -158,30 +158,85 @@ static void LinuxProcessList_initNetlinkSocket(LinuxProcessList* this) {

#endif

static void LinuxProcessList_updateCPUcount(ProcessList* super, FILE* stream) {
static void LinuxProcessList_updateCPUcount(ProcessList* super) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This sysfs-walk looks like a relatively expensive approach - going from what is currently a handful of syscalls at most, per sample, to potentially many thousands of additional syscalls with each sample on large CPU count machines.

Does an offline CPU appear at all in /proc/stat output? If not, we should be able to do this operation using a combination of a/ one-time call to sysconf(_SC_NPROCESSORS_CONF) at startup and, b/ CPU state-management based on presence (or zero values?) in /proc/stat (while reading stats) at sample time... i.e. if cpu22 of 32 is missing in /proc/stat mark it as !online ... and update activeCPU count accordingly - would that work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporter of the issue that probably got this started here. I have no other authority on the matter. :)

Another alternative in case of Linux would be to parse the list of online CPU nodes at /sys/devices/system/cpu/online on update. That would be single read from sysfs rather than one read per CPU.

The sysfs files are listed in the Linux sysfs documentation. The format for the CPU-listing files seems to be e.g. "0-3,5,7", which in case of the aggregate online file would mean that CPU nodes 0 to 3, and additionally 5 and 7 are online.

That would give explicit and authoritative information on which CPUs are currently being scheduled. It might of course be simpler to just assume that the ones not appearing in /proc/stat are offline. That seems to be the case on my machine but I have no broader knowledge on how the file lists the CPUs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original proposal used sysconf(_SC_NPROCESSORS_CONF), which is implemented as a sysfs-walk, see https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/getsysstats.c.html#236.
By doing the walk on our own we can search specific for the data we need.
Maybe at RC stage we can take a more serious look on performance.

} else {
unsigned int cpuid;
(void) sscanf(buffer, "cpu%4u %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu %16llu", &cpuid, &usertime, &nicetime, &systemtime, &idletime, &ioWait, &irq, &softIrq, &steal, &guest, &guestnice);
assert(cpuid == i - 1);
adjCpuId = cpuid + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is what I was looking for in that previous comment. :) So an alternate approach to finding offline CPUs might be to mark all offline at start of this function, and then mark each individually as 'online' as we parse the file and find a CPU (and update counts as we go if we need total online vs actual).

Currently htop does not support offline CPUs and hot-swapping, e.g. via
    echo 0 > /sys/devices/system/cpu/cpu2/online

Split the current single cpuCount variable into activeCPUs and
existingCPUs.

Supersedes: htop-dev#650
Related: htop-dev#580
Bogus uptime measurements can result in wrap-arounds, leading to
negative garbage values printed.
Kernel threads do not have an executable and the check can result in
garbage values as unprivileged user.
Wait until it has been decided what kind of task the entry actually is.
sched_getaffinity() and sched_setaffinity() are also available on BSDs.
Remove the Linux restraint.
Wait until it has been decided what kind of task the entry actually is.
Example hot-swapping:
    psradm -F -f 2
openbsd/OpenBSDProcessList.c:176:56: error: no member named 'ki_pid' in 'struct kinfo_proc'; did you mean 'p_pid'?
   const int mib[] = { CTL_KERN, KERN_PROC_CWD, kproc->ki_pid };
                                                       ^~~~~~
                                                       p_pid
/usr/include/sys/sysctl.h:375:10: note: 'p_pid' declared here
        int32_t p_pid;                  /* PID_T: Process identifier. */
                ^
openbsd/OpenBSDProcessList.c:458:33: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
      if (opl->cpus[i].cpuIndex == id)
          ~~~~~~~~~~~~~~~~~~~~~ ^  ~~
@BenBE BenBE added this to the 3.1.0 milestone Jul 18, 2021
@fasterit
Copy link
Member

fasterit commented Aug 2, 2021

Works on MacOSX 11.3.1 (Intel CPU)
https://discussions.apple.com/thread/250559322 on how to offline CPUs / Hypertreading on MacOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Extension or improvement to existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants