Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to ignore network speed if it is unknown #1989

Merged
merged 1 commit into from
Mar 18, 2021
Merged

Conversation

SuperQ
Copy link
Member

@SuperQ SuperQ commented Mar 4, 2021

Some devices (ex virtual) don't have a speed and report -1 as the
speed value. Add a flag to allow ignoring speed on these devices.

Fixes: #1967

Signed-off-by: Ben Kochie superq@gmail.com

@SuperQ
Copy link
Member Author

SuperQ commented Mar 4, 2021

CC @svalouch

Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, do you think it is test-worthy?

CHANGELOG.md Outdated Show resolved Hide resolved
@SuperQ SuperQ force-pushed the superq/net_speed branch 2 times, most recently from dbb92f9 to 60bb55c Compare March 4, 2021 09:43
@SuperQ
Copy link
Member Author

SuperQ commented Mar 4, 2021

Added to the end-to-end test.

@roidelapluie
Copy link
Member

LGTM

@svalouch
Copy link

svalouch commented Mar 4, 2021

This will cause metrics to disappear once the cable has been unplugged from a bare-metal server, which means that existing alerts that (ab)use this metric to report both wrong speed (e.g. drop from 10→1Gbit/s or lower due to faulty cable) as well as cable being unplugged / transceiver becoming faulty (value <= 0) need to be adjusted, I'm fine with that (just stating it in case someone does this like we did previously and finds this PR).

However, this is not consistent with what ethtool reports, which returns Unknown! for both 0 and -1. I'd prefer to have a consistent result from node_exporter if it starts to interpret the value (as opposed to simply applying math as before) that is consistent with what standard-tools report. So, either omit the metric or report a value that can be easily identified as "unknown" for value <= 0. The value was consistently 0 before release 1.0.0-rc.0, but I'm fine with omitting it for both values.

I've just found a couple of VMs that report 0 and I have no clue as to why (comparing them to others uncovered no obvious (virtual) hardware differences), so there indeed are cases where it does report 0 in the wild. So with this patch applied in its current state, we'd have the majority of VMs without that metric and some with that metric being 0.

@SuperQ
Copy link
Member Author

SuperQ commented Mar 4, 2021

We already have node_network_carrier for explicit link checks. By making the metric vanish, you can now do queries like node_network_carrier unless node_network_speed_bytes to find interfaces that have an "unknown" value.

@SuperQ
Copy link
Member Author

SuperQ commented Mar 4, 2021

A good example of this is lo. When you try and access the speed info, you get an error.

$ cat /sys/class/net/lo/speed 
cat: /sys/class/net/lo/speed: Invalid argument

The node_exporter sees this as a nil from the procfs library and doesn't expose a speed metric value.

@svalouch
Copy link

svalouch commented Mar 4, 2021

By making the metric vanish, you can now do queries like node_network_carrier unless node_network_speed_bytes to find interfaces that have an "unknown" value.

Ah, sounds like making it vanish is the superior solution, cool! Still, I'd like to see 0 treated the same for reasons stated above:

-if ifaceInfo.Speed != nil && *ifaceInfo.Speed >= 0 {
+if ifaceInfo.Speed != nil && *ifaceInfo.Speed > 0 {

@SuperQ
Copy link
Member Author

SuperQ commented Mar 4, 2021

Alright, let me look over a few things. You're probably right that we should treat 0 as unknown.

@discordianfish
Copy link
Member

LGTM but technically, this again is a breaking change.. As @svalouch said, I can definitely see people alerting on this.

@SuperQ
Copy link
Member Author

SuperQ commented Mar 16, 2021

So, this sucks a bit. We can't fix this without breaking existing use of the output. The only way to make it possible is to either cut this as part of node_exporter 2.0. Or add a feature flag to enable this change.

@SuperQ SuperQ force-pushed the superq/net_speed branch 2 times, most recently from bd3be5d to 00bc49a Compare March 17, 2021 13:04
@SuperQ SuperQ changed the title Don't expose network speed if it is unknown Add flag to ignore network speed if it is unknown Mar 17, 2021
@SuperQ SuperQ requested a review from roidelapluie March 17, 2021 13:05
@SuperQ SuperQ force-pushed the superq/net_speed branch from 00bc49a to bf8b419 Compare March 17, 2021 13:06
Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good compromise!

Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.

Fixes: #1967

Signed-off-by: Ben Kochie <superq@gmail.com>
@SuperQ SuperQ force-pushed the superq/net_speed branch from bf8b419 to 9893fca Compare March 18, 2021 10:36
@SuperQ SuperQ merged commit 857a91c into master Mar 18, 2021
@SuperQ SuperQ deleted the superq/net_speed branch March 18, 2021 21:58
SuperQ added a commit that referenced this pull request Jul 12, 2021
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors #2012
* [CHANGE] Make node_exporter print usage to STDOUT #2039
* [FEATURE] Add conntrack statistics metrics #1155
* [FEATURE] Add ethtool stats collector #1832
* [FEATURE] Add flag to ignore network speed if it is unknown #1989
* [FEATURE] Add tapestats collector for Linux #2044
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887
* [ENHANCEMENT] Add time zone offset metric #2060
* [BUGFIX] Add ErrorLog plumbing to promhttp #1887
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Fix panic when using backwards compatible flags #2000
* [BUGFIX] Only initiate collectors once #2048
* [BUGFIX] Handle small backwards jumps in CPU idle #2067

Signed-off-by: Ben Kochie <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Jul 12, 2021
SuperQ added a commit that referenced this pull request Jul 15, 2021
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors #2012
* [CHANGE] Make node_exporter print usage to STDOUT #2039
* [FEATURE] Add conntrack statistics metrics #1155
* [FEATURE] Add ethtool stats collector #1832
* [FEATURE] Add flag to ignore network speed if it is unknown #1989
* [FEATURE] Add tapestats collector for Linux #2044
* [FEATURE] Add nvme collector #2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887
* [ENHANCEMENT] Add more Infiniband counters #2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing #2033
* [ENHANCEMENT] Add time zone offset metric #2060
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Fix panic when using backwards compatible flags #2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache #2015
* [BUGFIX] Only initiate collectors once #2048
* [BUGFIX] Handle small backwards jumps in CPU idle #2067

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012
* [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039
* [FEATURE] Add conntrack statistics metrics prometheus#1155
* [FEATURE] Add ethtool stats collector prometheus#1832
* [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989
* [FEATURE] Add tapestats collector for Linux prometheus#2044
* [FEATURE] Add nvme collector prometheus#2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887
* [ENHANCEMENT] Add more Infiniband counters prometheus#2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033
* [ENHANCEMENT] Add time zone offset metric prometheus#2060
* [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983
* [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015
* [BUGFIX] Only initiate collectors once prometheus#2048
* [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012
* [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039
* [FEATURE] Add conntrack statistics metrics prometheus#1155
* [FEATURE] Add ethtool stats collector prometheus#1832
* [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989
* [FEATURE] Add tapestats collector for Linux prometheus#2044
* [FEATURE] Add nvme collector prometheus#2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887
* [ENHANCEMENT] Add more Infiniband counters prometheus#2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033
* [ENHANCEMENT] Add time zone offset metric prometheus#2060
* [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983
* [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015
* [BUGFIX] Only initiate collectors once prometheus#2048
* [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067

Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid node_network_speed_bytes value -125000 for disconnected or virtualized NICs
4 participants