Skip to content

Commit

Permalink
Merge pull request snabbco#1119 from Igalia/ptree-counter-docs
Browse files Browse the repository at this point in the history
Add ptree docs on counter collection and aggregation
  • Loading branch information
wingo authored Jun 22, 2018
2 parents ed902c4 + a533ed4 commit 25a5e5d
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 1 deletion.
19 changes: 18 additions & 1 deletion src/lib/ptree/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ configuration of a network function as a whole, and uses "app graph" to
refer to the network of Snabb apps that runs in a single worker
data-plane process.

The high-level design is that a manager from `lib.ptree.manager` is
The high-level design is that a manager from `lib.ptree.ptree` is
responsible for knowing the state and configuration of a data plane.
The manager also offers an interface to allow the outside world to query
the configuration and state, and to request configuration updates.
Expand All @@ -48,6 +48,23 @@ messages sent to it from the manager. Checking for update availability
requires just a memory access, not a system call, so the overhead of the
message channel on the data plane is very low.

The ptree manager will also periodically read counter values from the
data-plane processes that it manages, and aggregates them into
corresponding counters associated with the manager process. For
example, if two workers have an `apps/if/drops.counter` file, then the
manager will also expose an `apps/if/drops.counter`, whose value is the
sum of the counters from the individual workers, plus an archived
counter value that's the sum of counters from workers before they shut
down.

Finally, all of these periodically sampled counters from the workers as
well as the aggregate counters from the manager are also written into
[RRD files](../README.rrd.md), as a kind of "flight recorder" black-box
record of past counter change rates. This facility, limited by default
to the last 7 days, complements a more long-term statistics database,
and is mostly useful as a debugging and troubleshooting resource. To
view this historical data, use [`snabb top`](../../program/top/README).

## Example

See [the example `snabb ptree` program](../../program/ptree/README.md)
Expand Down
14 changes: 14 additions & 0 deletions src/program/top/README
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,17 @@ tree view, there are also keys to show and hide these subtrees.

Again, all of these commands are shown in the status bar, when
available.

Finally, it's possible to use `snabb top` to read a snapshot of counters
taken from some other machine. To take a sample of a machine's counters
and RRD files, do:

( cd /var/run; sudo tar cf /tmp/snabb-state-snapshot.tar.gz snabb )

Then copy /tmp/snabb-state-snapshot.tar.gz to a technician, who can unpack it:

mkdir /tmp/forensics; cd /tmp/forensics; tar xvf snabb-state-snapshot.tar.gz

Then to investigate this snapshot:

SNABB_SHM_ROOT=/tmp/forensics/snabb snabb top

0 comments on commit 25a5e5d

Please sign in to comment.