Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] exporter not noticing write errors #44

Closed
Eldiabolo21 opened this issue Sep 19, 2024 · 1 comment
Closed

[BUG] exporter not noticing write errors #44

Eldiabolo21 opened this issue Sep 19, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@Eldiabolo21
Copy link

Eldiabolo21 commented Sep 19, 2024

Hello!

First of all, thank you for your work and time, its really the best (and best maintained) zfs exporter out there!

One thing I noticed is that, the exporter doesnt pick up health warning when an unrecoverable error occured.
For example:

# zpool status -x
  pool: tank3
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 12:14:12 with 0 errors on Sun Sep  8 12:38:13 2024
config:

        NAME                        STATE     READ WRITE CKSUM
        tank3                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca232273cd8  ONLINE       0     0     0
            wwn-0x5000cca2310cafe8  ONLINE       0     0     0
            wwn-0x5000cca232293c78  ONLINE       1     0     0
            wwn-0x5000cca2310c9184  ONLINE       0     0     0
            wwn-0x5000cca2310c9160  ONLINE       0     0     0
            wwn-0x5000cca2310c82e0  ONLINE       0     0     0
            wwn-0x5000cca2310c775c  ONLINE       0     0     0
            wwn-0x5000cca2322937c8  ONLINE       0     0     0

errors: No known data errors

Prometheus metrics:

zfs_pool_health{instance="192.168.16.12:9134", job="zfs", pool="tank3"} 0
zfs_pool_health{instance="192.168.16.12:9134", job="zfs", pool="virt"} 0
zfs_pool_health{instance="192.168.16.12:9134", job="zfs", pool="virt2"} 0

Thats not exactly a problem but still something that should be caught and possible notified. Any chance to set the health value to 1 as well in these cases?

Cheers!


Edit: exporter version 2.3.1

@pdf pdf added the duplicate This issue or pull request already exists label Sep 19, 2024
@pdf
Copy link
Owner

pdf commented Sep 19, 2024

The zfs_pool_health metric has a specific meaning that maps to the status as reported by ZFS, per the HELP text on the metric:

# HELP zfs_pool_health Health status code for the pool [0: ONLINE, 1: DEGRADED, 2: FAULTED, 3: OFFLINE, 4: UNAVAIL, 5: REMOVED, 6: SUSPENDED].

It would be nice to publish detailed per-vdev status, but unfortunately zpool status is one of the few commands that doesn't provide a machine-parseable output flag, which would make it somewhat brittle to parse.

Duplicate of #5

@pdf pdf closed this as completed Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants