system monitoring horizon charts for terminal

cubestat is a command-line utility to monitor system metrics in horizon chart format. It was originally created for Apple M1/M2 devices, but supports Linux with NVIDIA GPU as well, including Google Colab environment. Numerous tools exist for tracking system metrics, yet horizon charts stand out due to their good information density which enables the display of many time-series data on a single screen.

Let's start with an example:

cubestat_high.mp4

In the clip above we see Mixtral-8x7b inference on MacBook Air with FF layers offloaded to SSD. We can notice somewhat low GPU util, 2Gb/s+ of data read from disk, as we have to fetch the weights, but plenty of free RAM (And we are actually able to serve almost 100Gb model on 24Gb machine with fp16 precision, even if very slow).

We can also clearly see moment of change from model loading (cpu util, disk writes for model preprocessing) to model inference (disk reads, gpu util going up, cpu going down)

Currently cubestat reports:

CPU utilization - configurable per core ('by_core'), cluster of cores on Apple M1+: Efficiency/Performance ('by_cluster') or all. Is shown as percentage.
GPU utilization per card/chip. Is shown in percentage. Works for Apple's M1/M2 SoC and NVIDIA GPUs. For NVIDIA GPU can show VRAM usage as well. In case of multi-GPU can show individual GPUs or aggregated average.
ANE (Neural Engine) power consumption. According to man powermetrics it is an estimate, but seems working good enough as a proxy to ANE utilization. Is shown as percentage.
Disk and network IO; Is shown as rate (KB/s, MB/s, GB/s).
Memory usage in %
Swap usage. Is shown as absolute value (KB, MB, GB)

Known limitations:

On MacOS cubestat needs to run powermetrics with sudo. You don't need to run cubestat itself with sudo, but you'll be asked sudo password when cubestat launches powermetrics. If you are comfortable doing that, you can add powermetrics to /etc/sudoers (your_user_name ALL=(ALL) NOPASSWD: /usr/bin/powermetrics) and avoid this.
Neural engine utilization is an estimate based on power usage, more on that below.
Needs 256 colors terminal

Installation:

% pip install cubestat

or

% pip install cubestat[cuda] # for instances with NVIDIA

Usage

usage: cubestat [-h] [--refresh_ms REFRESH_MS] [--buffer_size BUFFER_SIZE] [--view {off,one,all}] [--cpu {all,by_cluster,by_core}] [--gpu {collapsed,load_only,load_and_vram}] [--swap {show,hide}] [--network {show,hide}] [--disk {show,hide}]
                [--power {combined,all,off}]

options:
  -h, --help            show this help message and exit
  --refresh_ms REFRESH_MS, -i REFRESH_MS
                        Update frequency, milliseconds
  --buffer_size BUFFER_SIZE
                        How many datapoints to store. Having it larger than screen width is a good idea as terminal window can be resized
  --view {off,one,all}  legend/values/time mode. Can be toggled by pressing v.
  --cpu {all,by_cluster,by_core}
                        CPU mode - showing all cores, only cumulative by cluster or both. Can be toggled by pressing c.
  --gpu {collapsed,load_only,load_and_vram}
                        GPU mode - hidden, showing all GPUs load, or showing load and vram usage. Can be toggled by pressing g.
  --swap {show,hide}    Show swap . Can be toggled by pressing s.
  --network {show,hide}
                        Show network io. Can be toggled by pressing n.
  --disk {show,hide}    Show disk read/write. Can be toggled by pressing d.
  --power {combined,all,off}
                        Power mode - off, showing breakdown CPU/GPU/ANE load, or showing combined usage. Can be toggled by pressing p.

Interactive commands:

q - quit
v - toggle view mode
c - change cpu display mode (individual cores, aggregated or both)
g - change gpu display mode (individual gpus, aggregated and optionally VRAM usage)
d - show/hide disk reads/writes
n - show/hide network utilization
s - show/hide swap
p - show/hide power usage if available
UP/DOWN - scroll the lines in case there are more cores;
LEFT/RIGHT - scroll left/right. Autorefresh is paused when user scrolled to non-latest position. To resume autorefresh either scroll back to the right or press '0';
0 - reset horizontal scroll, continue autorefresh.

Notes and examples

Multi-gpu example

cubestat_4GPU.mov

We see a workload with uneven distribution between 4 GPUs installed. By pressing 'g' we can toggle the view mode to either show aggregate load, per GPU load or per GPU load and VRAM usage.

Apple Neural Engine utilization

A few notes on 'what does this even represent?'. Utilization we show is essentially current power consumption reported by powermetrics. To convert it to % we divide it by some maximum value observed in experimentation. When reading this metric, be aware:

The concept of 'utilization' overall it pretty ambiguous, e.g. when CPU is wasting cycles on a cache miss, is it 'utilized' or not? If CPU is doing scalar instructions on 1 execution port rather than vectorized instructions on several ports, is it 'utilized' or not?
It is unclear if power consumption is a decent proxy for utilization;
The upper bound must be different for different models (M1, M1 Max, M2, etc.). I tested it on M1, M2 and M1 Pro only;
It is unclear if my tests are actually hitting upperbound. The highest I could achieve was multiple layers of convolutions with no non-linearities between them;

Running on Google Colab

We can run cubestat on Google Colab instances to monitor GPU/CPU/IO usage.

First cell:

!pip install cubestat[cuda]
!pip install colab-xterm
%load_ext colabxterm
# export TERM=xterm-256color <---- RUN THIS IN TERMINAL
# cubestat                   <---- RUN THIS IN TERMINAL

Start xterm:

%xterm

In the terminal, configure 256 colors and start cubestat:

# export TERM=xterm-256color
# cubestat

Example notebook: colab example

Dependencies

Python 3.?+
psutil 5.9.5+
[optional] pynvml for NVIDIA cards monitoring

TODO

add 'help' for each metric
unit tests
memory modes - more details (cache/etc), absolute values rather than %, mmap handling
optional joint scale within metric group
nvidia GPU - probing vs momentary load to avoid missing spikes
support AMD GPUs
rent on vast.ai
support remote monitoring
NUMA grouping
per interface network utilization
perf on weak machine (e.g. pi zero)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

system monitoring horizon charts for terminal

Installation:

Usage

Notes and examples

Multi-gpu example

Apple Neural Engine utilization

Running on Google Colab

Dependencies

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

system monitoring horizon charts for terminal

Installation:

Usage

Notes and examples

Multi-gpu example

Apple Neural Engine utilization

Running on Google Colab

Dependencies

TODO