-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add zpool_influxdb command #10786
add zpool_influxdb command #10786
Conversation
7e2b7f0
to
e6b4c63
Compare
Codecov Report
@@ Coverage Diff @@
## master #10786 +/- ##
===========================================
- Coverage 79.76% 42.50% -37.27%
===========================================
Files 395 365 -30
Lines 125039 116223 -8816
===========================================
- Hits 99742 49402 -50340
- Misses 25297 66821 +41524
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a cool idea. I take it the output of this command is InfluxDB line format data that could be piped directly into a curl command. Could you add a command line argument for extra tags? For example if I want to also add a hostname tag, something like --extra-tags hostname=$(hostname)
.
cmd/zpool_influxdb/zpool_influxdb.h
Outdated
} | ||
#endif | ||
|
||
#endif /* ZFS_ZPOOL_INFLUXDB_H */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we do without this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it as a convenient place to define SUPPORT_UINT64. But perhaps it is better to reverse that logic and allow override for not supporting uint64. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way can be done without a header.
If you see this being a regularly used configuration option, I would add something to ./configure for it instead. Something that detects the influxdb version by default and can be outsmarted by --with-influxdb=2
or the like.
This doesn't actually have a build dependency on influxdb, so falling back to the newest version if not present would seem appropriate. With that in mind, inverting the logic to be an opt-in for compat with the older version makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change for unsigned support actually is several years old now. When using telegraf, which is the preferred method, unsigned ints are handled properly by telegraf and we don't really need a recompile. I'd like to treat this as a "don't go back to the ice age"
excellent idea! I've used that on other collectors I've written. I'll add it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool. It'll be nice to have a better way to collect this kind of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardelling would it make sense to rename this monitoring utility something a little more generic? That would allow us to extend it to other possible output formats without confusion. Or make it slightly less confusing if things other than influxdb find it useful. Maybe zstat
or zmonitor
?
tests/runfiles/common.run
Outdated
|
||
[tests/functional/zpool_influxdb] | ||
tests = 'zpool_influxdb' | ||
tags = ['functional', 'metrics'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd probably make sense to move these tests under tests/functional/cli_user/
and run then as a normal unprivileged user. (user =
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, will do
@behlendorf checkstyle/deploy is failing, would a rebase fix? |
@richardelling I've restarted it manually |
@richardelling yep, rebase may fix it with |
At first glance, there is some merit to that idea. However, the two prevailing metrics styles, prometheus and influx, are very different in how they print metrics. Approximately half of the code is around printing, much less than the actual data collection which is almost trivial by comparison. They are also very different in that influx is most often used in a push model (push metrics to a database HTTP endpoint) while prometheus is used in a pull model (database collects metrics from a ZFS-node-based HTTP endpoint). Obviously including a HTTP API service is more tedious to get right, especially in C. Today, both zpool_influxdb and zpool_prometheus are available from my public repo. My plan is to get zpool_influxdb in and then update zpool_prometheus with a builtin HTTP server and contribute that separately. That said, a better architecture on the ZFS side is to relocate the spa config and spa stats into kstats. That will expose silly limitations in last century's kstat design. So maybe that is a task for the future. |
That makes sense, I just wanted to get your thoughts. If the need arises we can always do this in the future.
Yes, or perhaps something a bit more flexible than kstats. But I agree that's a job for another day. If you can rebase this and resolve that one last bit of feedback this looks ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to add cmd/zpool_influxdb/.gitignore
containing /zpool_influxdb
as well.
Is it possible to add a screenshot to the pull request to show how the result will look like? I did something similar for netdata (data collector which shows space distribution in pool between filesystems/snapshots). I wonder if it much superior than my experiments. |
@IvanVolosyuk the size of datasets is not available in the pool configuration we read here. For dataset information, the various data collectors and agents already exist to deliver that info via kstats. A future project that would be useful is to convert the pool stats into kstats. Until then, this approach works better than screen-scraping zpool command output. |
of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
6791c0c
to
774a4db
Compare
This was requested but forgotten in openzfs#10786 Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes openzfs#10786
This was requested but forgotten in openzfs#10786. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes openzfs#11071
A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes openzfs#10786
This was requested but forgotten in openzfs#10786. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes openzfs#11071
A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes openzfs#10786
This was requested but forgotten in openzfs#10786. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes openzfs#11071
A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB
time-series database. Examples are given on how to integrate with the telegraf statistics aggregator,
a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency
distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment.
Motivation and Context
InfluxDB is one of the premier open-source time-series databases. There exists methods to get
simple zpool properties and zfs performance data from /proc into influxdb via telegraf. However,
the pool specifics are not readily available in /proc. Rather ZFS admins have relied on the zpool
command. Unfortunately, the zpool command is intended for humans and cannot be parsed easily.
zpool_influxdb can be considered a replacement for zpool which is intended for parsing by influxdb.
Description
In many ways, zpool_influxdb can be considered a userland replacement for parseable zpool output.
Unlike the zpool command which reads all of the pool configuration, health, and performance data
and then only shows a very small subset of the information, zpool_influxdb comprehensively presents
all of the information in one pass.
It is also possible to look at the output of zpool_influxdb command directly. It just isn't intended to be
human-friendly, so if you are a human, use the zpool command instead.
How Has This Been Tested?
This PR includes new ZTS tests.
Types of changes
Checklist:
Signed-off-by
.