Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs collector - panic when parse.Uint on a negative value #1158

Closed
sevagh opened this issue Nov 15, 2018 · 5 comments
Closed

zfs collector - panic when parse.Uint on a negative value #1158

sevagh opened this issue Nov 15, 2018 · 5 comments

Comments

@sevagh
Copy link
Contributor

sevagh commented Nov 15, 2018

Just got some negative values in my ZFS kstat:

$ cat /proc/spl/kstat/zfs/ztank/io
20 3 0x00 1 80 16996408425 20115450070576800
nread    nwritten reads    writes   wtime    wlentime wupdate  rtime    rlentime rupdate  wcnt     rcnt
496595193856 326928208207872 28635533 1225798681 4589967829674802 -4516494086296980576 20115449382517651 4693467351523302 -4516494086296980576 20115449392170545 0        0

I'm trying to find out why that's happening. However, this crashes node_exporter (for obvious reasons, parse.Uint):

{"level":"error","msg":"ERROR: zfs collector failed after 0.001139s: could not parse expected integer value for \"kstat.zfs.misc.io.wlentime\": strconv.ParseUint: parsing \"-4516539354805799303\": invalid syntax","source":"collector.go:132","time":"2018-11-15T10:27:01-08:00"}

Should node-exporter be able to scrape these values?

@discordianfish
Copy link
Member

Not familiar with zfs. Is a negative value here something valid?
I found this definition:

hrtime_t wlentime; /* cumulative wait lengthtime product/

Doesn't sound like this should be ever negative? So maybe it's actually an kernel/upstream issue?
Either way, it probably is reasonable to change it to use ParseInt.

@sevagh
Copy link
Contributor Author

sevagh commented Nov 19, 2018

I agree that it's weird - could be any number of problems (bad hardware clock, etc.) I've filed a bug with zol: openzfs/zfs#8131

Should I make a PR here?

@discordianfish
Copy link
Member

Yes, I think it's reasonable to be more liberal in what we accept here.

@sevagh
Copy link
Contributor Author

sevagh commented Nov 19, 2018

Interestingly, I came across this: d7348a5

By changing Uint64 to Int64, I failed the zil fixture test due to overflow in Int64 - this value was added in this commit which references a similar issue:

  • added e2e test for negative values represented by uint64 that can result from ZFS bugs

The zfsonlinux version referenced there is 0.7.5. We're running:

$ sudo modinfo zfs
filename:       /lib/modules/3.16.0-4-amd64/updates/dkms/zfs.ko
version:        0.6.5.2-2

It's a pretty old version of zfs.

I was curious if the behavior changed for handling negative values in zfs-on-linux - I followed the chain here: openzfs/zfs#6988

Not sure it's a good idea to patch node_exporter for a bizarre bug on an old version of ZFS that's missing many bug fixes.

@discordianfish
Copy link
Member

Ah, thanks for investigating! I agree, in think in this case there is nothing we can/should do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants