Fix 32-bit rollovers and add rounding in iostat metrics #30679

fearful-symmetry · 2022-03-04T00:25:46Z

What does this PR do?

Not assigning any reviewers yet, still a bit paranoid and testing this one.

This is a fix for: #30480

After wading through some conflicting kernel docs, I discovered that certain fields reported by /proc/diskstats are in fact unsigned 32-bit integers, which means the point at which they roll over is a relatively low 4.2 billion. If we're not careful, this will result in us overflowing a bunch of unsigned values when we do current - last on the iostat math. This adds a little wrapper that tries to "fix" a rolled-over 32-bit value based on a prior good value. This also adds some rounding for the float values, just to clean up the math.

Why is it important?

This bug can result in sporadic bad data on systems with high IO load or long uptimes.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

This reverts commit 3550969.

elasticmachine · 2022-03-04T00:25:48Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine · 2022-03-04T01:50:41Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-03-04T22:46:54.119+0000
Duration: 132 min 1 sec

Test stats 🧪

Test	Results
Failed	0
Passed	42426
Skipped	3714
Total	46140

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

libbeat/metric/system/diskio/diskstat_linux.go

belimawr

I read you're still testing this one, so sorry if I'm stating the obvious.

What about adding a test to CalcIOStatistics that simulates the rollover?

libbeat/metric/system/diskio/diskstat_linux.go

libbeat/metric/system/diskio/diskstat_linux_test.go

libbeat/metric/system/diskio/diskstat_linux.go

cmacknz · 2022-03-04T18:53:21Z

libbeat/metric/system/diskio/diskstat_linux.go

+		return current - prev
+	}
+	// we're at a uint64 if we hit this
+	if prev > maxUint32 {


If the underlying value is 32 bits, this will never happen?

If it's 64 bits, wouldn't you want to do the same math with math.MaxUint64?

The only time you will need to do any math that depends on the actual integer width is when rollover occurs. Aren't you guaranteed that prev > maxUint32 in that case, and you can switch the max for the calculation to use maxUint64 instead?

Yah, that felt like a paranoid edge case that I should in theory cover, but I wasn't sure if it made sense to actually bother. I mean, if we actually want to "fix" 64-bit rollover, that function should be on all fields, not just a few.

Fair enough, in that case let's put 32 in the function name so that it's obvious that we are only trying to fix this for 32 bit counters.

returnOrFix32BitRollover or something like that.

The safest way to address this would be to change the signature to func(curr, prev uint32) uint64, bit widening within the function. This would also have the advantage of statically removing the possibility of this case.

++ That is much better than just changing the name of the function, or hoping this doesn't happen.

It does have consequences that concern me; if the kernel changes type for this in the future, the call will still work but with silently corrupted results. What you want to cover that case is non-wrap-around integer arithmetic which would require the full width to be known at call time. This would either be (with ugliness) passing both the 32-bit truncation and the 64-bit original, or with a type conversion helper that signals the unexpected high bits somehow — the calling functions return an error, so this is possible, but with four calls it starts to get unwieldy.

Just thought I should bring these up to avoid sending you down the wrong rabbit hole.

libbeat/metric/system/diskio/diskstat_linux.go

efd6 · 2022-03-04T21:02:29Z

libbeat/metric/system/diskio/diskstat_linux.go

+
+// See https://docs.kernel.org/admin-guide/iostats.html and https://github.com/torvalds/linux/blob/master/block/genhd.c diskstats_show()
+func returnOrFixRollover(current, prev uint64) uint64 {
+	var maxUint32 uint64 = math.MaxUint32 //4_294_967_295 Max value in uint32/unsigned int


math.MaxUint32 is an untyped constant, so this should not be necessary.

Yah, did it as a separate variable more in the hopes of making the logic a little easier to follow.

I'm not sure that it does; the point of having these as untyped was exactly to allow this kind of use.

efd6 · 2022-03-04T21:54:39Z

libbeat/metric/system/diskio/diskstat_linux.go

+		return current - prev
+	}
+	// we're at a uint64 if we hit this
+	if prev > maxUint32 {


The safest way to address this would be to change the signature to func(curr, prev uint32) uint64, bit widening within the function. This would also have the advantage of statically removing the possibility of this case.

efd6 · 2022-03-04T21:56:26Z

libbeat/metric/system/diskio/diskstat_linux.go

-	result.AvgRequestSize = size
-	result.AvgQueueSize = queue
-	result.AvgAwaitTime = wait
+	result.AvgRequestSize = common.Round(size, common.DefaultDecimalPlacesCount)


Not necessarily a concern, but this article does a nice job of explaining the pitfalls of rounding floats.

Yep. In this case, that common.Round idiom is everywhere in beats, and it was bugging me to not have it here.

Not for here, but it's probably worth looking at changing that; float rounding should really not happen until render time.

…lover

* PoC for optional json encoding * Revert "PoC for optional json encoding" This reverts commit 3550969. * try to fix rolled-over values in diskio, add rounding * use math package, add docs * name change * change name, add changelog (cherry picked from commit ff32f15)

) * PoC for optional json encoding * Revert "PoC for optional json encoding" This reverts commit 3550969. * try to fix rolled-over values in diskio, add rounding * use math package, add docs * name change * change name, add changelog (cherry picked from commit ff32f15) Co-authored-by: Alex K <8418476+fearful-symmetry@users.noreply.github.com>

thekofimensah · 2022-03-10T18:11:37Z

Which version will this be available to? 7.17.0?

thekofimensah · 2022-03-22T18:51:50Z

@fearful-symmetry I looked around and I couldn't find this change in any of the new releases' source codes: https://github.com/elastic/beats/releases what version can I expect to find this change?

fearful-symmetry · 2022-03-22T22:25:14Z

@thekofimensah it should be available in 7.17.2, which will be released at the end of the month.

fearful-symmetry added 10 commits January 5, 2022 14:42

PoC for optional json encoding

3550969

Merge remote-tracking branch 'upstream/master'

8489770

Merge remote-tracking branch 'upstream/main'

098d69e

Merge remote-tracking branch 'upstream/main'

2389c3f

Merge remote-tracking branch 'upstream/main'

505eb57

Merge remote-tracking branch 'upstream/main'

aaf7da3

Revert "PoC for optional json encoding"

107cafc

This reverts commit 3550969.

Merge remote-tracking branch 'upstream/main'

395dcd5

Merge remote-tracking branch 'upstream/main'

ae6a850

try to fix rolled-over values in diskio, add rounding

6b880cf

fearful-symmetry added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-7.17 Automated backport to the 7.17 branch with mergify labels Mar 4, 2022

fearful-symmetry self-assigned this Mar 4, 2022

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 4, 2022

cmacknz reviewed Mar 4, 2022

View reviewed changes

libbeat/metric/system/diskio/diskstat_linux.go Outdated Show resolved Hide resolved

belimawr reviewed Mar 4, 2022

View reviewed changes

libbeat/metric/system/diskio/diskstat_linux.go Outdated Show resolved Hide resolved

libbeat/metric/system/diskio/diskstat_linux_test.go Outdated Show resolved Hide resolved

libbeat/metric/system/diskio/diskstat_linux.go Outdated Show resolved Hide resolved

use math package, add docs

b59b43e

fearful-symmetry requested review from a team, cmacknz and belimawr March 4, 2022 18:40

cmacknz reviewed Mar 4, 2022

View reviewed changes

libbeat/metric/system/diskio/diskstat_linux.go Outdated Show resolved Hide resolved

cmacknz reviewed Mar 4, 2022

View reviewed changes

libbeat/metric/system/diskio/diskstat_linux.go Show resolved Hide resolved

name change

30f0ed2

cmacknz approved these changes Mar 4, 2022

View reviewed changes

efd6 reviewed Mar 4, 2022

View reviewed changes

fearful-symmetry added 2 commits March 4, 2022 14:45

Merge remote-tracking branch 'upstream/main' into fix-diskstat-32-rol…

8a915d8

…lover

change name, add changelog

217381a

fearful-symmetry merged commit ff32f15 into elastic:main Mar 7, 2022

mergify bot mentioned this pull request Mar 7, 2022

[7.17](backport #30679) Fix 32-bit rollovers and add rounding in iostat metrics #30718

Merged

jlind23 mentioned this pull request Mar 8, 2022

Bug in iostat-await calculation in Metricbeat #30480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 32-bit rollovers and add rounding in iostat metrics #30679

Fix 32-bit rollovers and add rounding in iostat metrics #30679

fearful-symmetry commented Mar 4, 2022

elasticmachine commented Mar 4, 2022

elasticmachine commented Mar 4, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

belimawr left a comment

cmacknz Mar 4, 2022

fearful-symmetry Mar 4, 2022

cmacknz Mar 4, 2022

efd6 Mar 4, 2022

cmacknz Mar 5, 2022

efd6 Mar 6, 2022

efd6 Mar 4, 2022

fearful-symmetry Mar 4, 2022

efd6 Mar 5, 2022

efd6 Mar 4, 2022

efd6 Mar 4, 2022

fearful-symmetry Mar 4, 2022

efd6 Mar 5, 2022

thekofimensah commented Mar 10, 2022

thekofimensah commented Mar 22, 2022

fearful-symmetry commented Mar 22, 2022

Fix 32-bit rollovers and add rounding in iostat metrics #30679

Fix 32-bit rollovers and add rounding in iostat metrics #30679

Conversation

fearful-symmetry commented Mar 4, 2022

What does this PR do?

Why is it important?

Checklist

elasticmachine commented Mar 4, 2022

elasticmachine commented Mar 4, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

belimawr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thekofimensah commented Mar 10, 2022

thekofimensah commented Mar 22, 2022

fearful-symmetry commented Mar 22, 2022

elasticmachine commented Mar 4, 2022 •

edited by jenkins-beats-ci bot

Loading