-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 32-bit rollovers and add rounding in iostat metrics #30679
Fix 32-bit rollovers and add rounding in iostat metrics #30679
Conversation
This reverts commit 3550969.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read you're still testing this one, so sorry if I'm stating the obvious.
What about adding a test to CalcIOStatistics
that simulates the rollover?
return current - prev | ||
} | ||
// we're at a uint64 if we hit this | ||
if prev > maxUint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the underlying value is 32 bits, this will never happen?
If it's 64 bits, wouldn't you want to do the same math with math.MaxUint64
?
The only time you will need to do any math that depends on the actual integer width is when rollover occurs. Aren't you guaranteed that prev > maxUint32 in that case, and you can switch the max for the calculation to use maxUint64 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, that felt like a paranoid edge case that I should in theory cover, but I wasn't sure if it made sense to actually bother. I mean, if we actually want to "fix" 64-bit rollover, that function should be on all fields, not just a few.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, in that case let's put 32 in the function name so that it's obvious that we are only trying to fix this for 32 bit counters.
returnOrFix32BitRollover
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safest way to address this would be to change the signature to func(curr, prev uint32) uint64
, bit widening within the function. This would also have the advantage of statically removing the possibility of this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ That is much better than just changing the name of the function, or hoping this doesn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does have consequences that concern me; if the kernel changes type for this in the future, the call will still work but with silently corrupted results. What you want to cover that case is non-wrap-around integer arithmetic which would require the full width to be known at call time. This would either be (with ugliness) passing both the 32-bit truncation and the 64-bit original, or with a type conversion helper that signals the unexpected high bits somehow — the calling functions return an error, so this is possible, but with four calls it starts to get unwieldy.
Just thought I should bring these up to avoid sending you down the wrong rabbit hole.
|
||
// See https://docs.kernel.org/admin-guide/iostats.html and https://github.com/torvalds/linux/blob/master/block/genhd.c diskstats_show() | ||
func returnOrFixRollover(current, prev uint64) uint64 { | ||
var maxUint32 uint64 = math.MaxUint32 //4_294_967_295 Max value in uint32/unsigned int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
math.MaxUint32
is an untyped constant, so this should not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, did it as a separate variable more in the hopes of making the logic a little easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it does; the point of having these as untyped was exactly to allow this kind of use.
return current - prev | ||
} | ||
// we're at a uint64 if we hit this | ||
if prev > maxUint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safest way to address this would be to change the signature to func(curr, prev uint32) uint64
, bit widening within the function. This would also have the advantage of statically removing the possibility of this case.
result.AvgRequestSize = size | ||
result.AvgQueueSize = queue | ||
result.AvgAwaitTime = wait | ||
result.AvgRequestSize = common.Round(size, common.DefaultDecimalPlacesCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a concern, but this article does a nice job of explaining the pitfalls of rounding floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. In this case, that common.Round
idiom is everywhere in beats, and it was bugging me to not have it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for here, but it's probably worth looking at changing that; float rounding should really not happen until render time.
) * PoC for optional json encoding * Revert "PoC for optional json encoding" This reverts commit 3550969. * try to fix rolled-over values in diskio, add rounding * use math package, add docs * name change * change name, add changelog (cherry picked from commit ff32f15) Co-authored-by: Alex K <8418476+fearful-symmetry@users.noreply.github.com>
Which version will this be available to? 7.17.0? |
@fearful-symmetry I looked around and I couldn't find this change in any of the new releases' source codes: https://github.com/elastic/beats/releases what version can I expect to find this change? |
@thekofimensah it should be available in 7.17.2, which will be released at the end of the month. |
What does this PR do?
Not assigning any reviewers yet, still a bit paranoid and testing this one.
This is a fix for: #30480
After wading through some conflicting kernel docs, I discovered that certain fields reported by
/proc/diskstats
are in fact unsigned 32-bit integers, which means the point at which they roll over is a relatively low 4.2 billion. If we're not careful, this will result in us overflowing a bunch of unsigned values when we docurrent - last
on theiostat
math. This adds a little wrapper that tries to "fix" a rolled-over 32-bit value based on a prior good value. This also adds some rounding for the float values, just to clean up the math.Why is it important?
This bug can result in sporadic bad data on systems with high IO load or long uptimes.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.