Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Part of my ongoing quest to understand what's happening inside the box (previously).
This time, its counters showing what
vdev_queue
is up to.Description
Adds a bunch of
wmsum_t
counters to evervdev_queue
instance for a real device. This show current count of IOs queued and in-flight (total and broken down by class), total IOs in/out over the lifetime of the queue, and basic aggregation counters.The counters are exposed under
/proc/spl/kstat/zfs/<pool>/vdev/<guid>/queue
on Linux, orkstat.zfs.<pool>.vdev.<guid>.misc.queue
on FreeBSD.FreeBSD:
Notes
The actual stats part is pretty unremarkable, being little more than the normal "sums & stats" boilerplate. They perhaps don't technically need to be
wmsum_t
, since all the changes are made undervq_lock
anyway, but its following a common pattern and part of why I want this is to assist with removing or greatly reducing the scope ofvq_lock
, so this is where they'll need to be anyway.The more interesting part of the PR is in the SPL kstats changes. These could be a separate PR, even two, but since they have no other application (yet) it seems fair to leave them here so at least there's something to test with. (I will however make them separate PRs on request).
The main part is allowing for multi-level kstat module names. I want this so I can bolt sub-object stats (like individual vdevs) under the pool stats, as you see. For Linux its not really complex, just a little more housekeeping. For FreeBSD, every kstat has its own "view" of the tree anyway, attached to the sysctl context, so its quite trivial as no cleanup code is required.
The name reuse thing, meanwhile, is the least invasive solution I could find to an annoying structural problem that came up. Every
vdev_t
has avdev_queue_t
that isn't easily decoupled, and now everyvdev_queue_t
creates some stats. During import, a tree ofvdev_t
s are created with the untrusted config, and then a second set with the trusted config. Both of these register kstats with the same names. The effective policy that falls out of the implementations was that the first to claim the name wins, so the untrusted vdev tree gets them. Once the pool is imported though, that tree is discarded. The trusted tree remains and becomes the active pool, but at that point it never got to register its kstats, and the original ones are gone.Reordering the import is not really possible, as the two trees briefly coexist to copy "updated" values from the untrusted tree to the trusted (eg device paths have changed since last import). There's no comfortable way I could find to know where in the process we are, and don't create stats until the live one comes up. There's other options, like delaying kstats creation until first use, but in all these cases it felt dangerous to be mucking in pool and vdev initialisation just to satisfy a quirk of the kstats system.
So instead, I effectively just changed the policy from first-wins to last-wins, and it all works out ok. There's probably a better structed "correct" way to sort it out, but I'll leave that for the eventual stats subsystem rewrite that of course is now buzzing in the back of my head 😇.
How Has This Been Tested?
Mostly through repeated pool create -> IO -> scrub -> export -> import -> IO -> export -> unload cycles, on both Linux and FreeBSD. Once the numbers looked good and things stopped complaining about replacement names and/or panicking, I declared it good.
Types of changes
Checklist:
Signed-off-by
.