-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk R/W Data (node_disk_read_bytes_total ) shows incorrect for NVMe formatted with 4KiB sector size, or HDD with 4K native #2310
Comments
Can you share the results of For example on my laptop:
|
output of
and the block size for these devices
I was running a benchmark testing lots of different drive models, so I noticed grafana was accurate for some and not others for disk bandwidth, which is how I found this. |
I wonder if I got this wrong when I refactored diskstats_linux.go. Currently it takes node_exporter/collector/diskstats_linux.go Line 213 in e3a18fd
|
@ventifus Good question. I wonder how iostat accounts for different sector sizes. |
I found the answer in https://www.kernel.org/doc/Documentation/block/stat.txt
I was incorrect in assuming that ReadSectors / WriteSectors were in hardware sector units. I'll prepare a PR to use 512-byte "sectors" in all cases. |
Yea, that decision fits with the "don't break userspace" philosophy. The kernel interface would not change out from under any version of iostat. |
1 similar comment
Yea, that decision fits with the "don't break userspace" philosophy. The kernel interface would not change out from under any version of iostat. |
This is a regression from #2141 |
v1.3.1, the most up to date released version, has a bug that inflates the bytes written by ~8x for NVMe drives (which in particular includes the default drives for our GCE roachprod machines). Fundamentally this is caused by the fact that these devices use a 4K sector size whereas the kernel will always report based on a 512B sector size. This took us a while to figure out, and to avoid repeating this exercise periodically, downgrade node_exporter to 1.2.2, which pre-dates a refactor that introduces the regression. See: prometheus/node_exporter#2310 Release note: None
83014: ui: add internal app filter to active statements and transactions pages r=ericharmeling a=ericharmeling This PR adds a single internal app filter option on to the Active Statements and Active Transactions pages. Active statements and transactions run by internal apps are no longer displayed by default. See commit message for release note. https://user-images.githubusercontent.com/27286675/174156635-39d8649a-df91-4550-adb5-b3c167d54ed5.mov Fixes #81072. 83707: roachtest: run workload from the tenant node r=knz a=stevendanna The secure URL refers to paths on disk on the clusters in the node. Since we only create the tenant-scoped certs on the tenant node, we need to run workload from that node. Fixes #82266 Depends on #83703 Release note: None 84003: storage: close pebble iter gracefully when NewPebbleSSTIterator fails r=erikgrinaker a=msbutler Currently, if `pebble.NewExternalIter` sets pebbleIterator.inuse to True, but then fails, the subsequent `pebbleIterator.destroy()` will panic unecessarily, since the caller of `pebble.NewExternalIter` is not actually using the iter. This bug causes TestBackupRestoreChecksum to flake in #83984. To fix, this patch uses pebble.Close() to gracefully close the pebbleIterator if `pebble.NewExternalIter` fails. Release Note: None 84039: prometheus: use older node_exporter r=nicktrav a=tbg v1.3.1, the most up to date released version, has a bug that inflates the bytes written by ~8x for NVMe drives (which in particular includes the default drives for our GCE roachprod machines). Fundamentally this is caused by the fact that these devices use a 4K sector size whereas the kernel will always report based on a 512B sector size. This took us a while to figure out, and to avoid repeating this exercise periodically, downgrade node_exporter to 1.2.2, which pre-dates a refactor that introduces the regression. See: prometheus/node_exporter#2310 Release note: None Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Host operating system: output of
uname -a
Linux msiz590 5.13.0-30-generic #33-Ubuntu SMP Fri Feb 4 17:03:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of
node_exporter --version
node_exporter, version 1.3.1 (branch: HEAD, revision: a2321e7)
node_exporter command line flags
default
Are you running node_exporter in Docker?
no
What did you do that produced an error?
node_disk_read_bytes_total does not work correctly for 4KiB sector size disk, either NVMe SSD or SATA 4K native HDD. It over estimated the bytes read by 8x (since its converting sectors to bytes read)
What did you expect to see?
iostat, dstat, and /proc/diskstats all show correct amount of data written
this is correct data written
What did you see instead?
node_disk_read_bytes_total reporting 10808419123200 bytes
this messes up grafana node exporter dashboard for Disk R/W Data
very easy to reproduce
take any modern NVMe drive and do
sudo apt install nvme-cli
find output of identify namespace for which LBA format is 0 metadata size, and 4096 bytes
sudo nvme id-ns /dev/nvme0n1 -H
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good (in use)
format the drive and change sector size (this also wipes all data, does a cryptographic erase on most NVMe)
sudo nvme format /dev/nvme0n1 -l 2
The text was updated successfully, but these errors were encountered: