-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 32-bit rollovers and add rounding in iostat metrics #30679
Changes from 12 commits
3550969
8489770
098d69e
2389c3f
505eb57
aaf7da3
107cafc
395dcd5
ae6a850
6b880cf
b59b43e
30f0ed2
8a915d8
217381a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,9 +21,12 @@ | |
package diskio | ||
|
||
import ( | ||
"math" | ||
|
||
"github.com/pkg/errors" | ||
"github.com/shirou/gopsutil/v3/disk" | ||
|
||
"github.com/elastic/beats/v7/libbeat/common" | ||
"github.com/elastic/beats/v7/libbeat/metric/system/numcpu" | ||
) | ||
|
||
|
@@ -52,6 +55,28 @@ func (stat *IOStat) OpenSampling() error { | |
return stat.curCPU.Get() | ||
} | ||
|
||
// a few of the diskio counters are actually 32-bit on the kernel side, which means they can roll over fairly easily. | ||
// Here we try to reconstruct the values by calculating the pre-rollover delta from unt32 max, then adding. | ||
// If you want to get technical, this could be a tad unsafe, as we don't actually have any way of knowing if the word size changes in a future kernel, and we've rolled over at UINT64_MAX | ||
|
||
// See https://docs.kernel.org/admin-guide/iostats.html and https://github.com/torvalds/linux/blob/master/block/genhd.c diskstats_show() | ||
func returnOrFixRollover(current, prev uint64) uint64 { | ||
var maxUint32 uint64 = math.MaxUint32 //4_294_967_295 Max value in uint32/unsigned int | ||
|
||
if current >= prev { | ||
return current - prev | ||
} | ||
// we're at a uint64 if we hit this | ||
if prev > maxUint32 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the underlying value is 32 bits, this will never happen? If it's 64 bits, wouldn't you want to do the same math with The only time you will need to do any math that depends on the actual integer width is when rollover occurs. Aren't you guaranteed that prev > maxUint32 in that case, and you can switch the max for the calculation to use maxUint64 instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah, that felt like a paranoid edge case that I should in theory cover, but I wasn't sure if it made sense to actually bother. I mean, if we actually want to "fix" 64-bit rollover, that function should be on all fields, not just a few. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough, in that case let's put 32 in the function name so that it's obvious that we are only trying to fix this for 32 bit counters.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The safest way to address this would be to change the signature to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ That is much better than just changing the name of the function, or hoping this doesn't happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does have consequences that concern me; if the kernel changes type for this in the future, the call will still work but with silently corrupted results. What you want to cover that case is non-wrap-around integer arithmetic which would require the full width to be known at call time. This would either be (with ugliness) passing both the 32-bit truncation and the 64-bit original, or with a type conversion helper that signals the unexpected high bits somehow — the calling functions return an error, so this is possible, but with four calls it starts to get unwieldy. Just thought I should bring these up to avoid sending you down the wrong rabbit hole. |
||
return 0 | ||
} | ||
|
||
delta := maxUint32 - prev | ||
|
||
return delta + current | ||
|
||
} | ||
|
||
// CalcIOStatistics calculates IO statistics. | ||
func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, error) { | ||
var last disk.IOCountersStat | ||
|
@@ -72,13 +97,14 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err | |
rdIOs := counter.ReadCount - last.ReadCount | ||
rdMerges := counter.MergedReadCount - last.MergedReadCount | ||
rdBytes := counter.ReadBytes - last.ReadBytes | ||
rdTicks := counter.ReadTime - last.ReadTime | ||
rdTicks := returnOrFixRollover(counter.ReadTime, last.ReadTime) | ||
wrIOs := counter.WriteCount - last.WriteCount | ||
wrMerges := counter.MergedWriteCount - last.MergedWriteCount | ||
wrBytes := counter.WriteBytes - last.WriteBytes | ||
wrTicks := counter.WriteTime - last.WriteTime | ||
ticks := counter.IoTime - last.IoTime | ||
aveq := counter.WeightedIO - last.WeightedIO | ||
wrTicks := returnOrFixRollover(counter.WriteTime, last.WriteTime) | ||
ticks := returnOrFixRollover(counter.IoTime, last.IoTime) | ||
aveq := returnOrFixRollover(counter.WeightedIO, last.WeightedIO) | ||
|
||
nIOs := rdIOs + wrIOs | ||
nTicks := rdTicks + wrTicks | ||
nBytes := rdBytes + wrBytes | ||
|
@@ -94,7 +120,7 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err | |
|
||
queue := float64(aveq) / deltams | ||
perSec := func(x uint64) float64 { | ||
return 1000.0 * float64(x) / deltams | ||
return common.Round(1000.0*float64(x)/deltams, common.DefaultDecimalPlacesCount) | ||
} | ||
|
||
result := IOMetric{} | ||
|
@@ -104,17 +130,17 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err | |
result.WriteRequestCountPerSec = perSec(wrIOs) | ||
result.ReadBytesPerSec = perSec(rdBytes) | ||
result.WriteBytesPerSec = perSec(wrBytes) | ||
result.AvgRequestSize = size | ||
result.AvgQueueSize = queue | ||
result.AvgAwaitTime = wait | ||
result.AvgRequestSize = common.Round(size, common.DefaultDecimalPlacesCount) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily a concern, but this article does a nice job of explaining the pitfalls of rounding floats. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. In this case, that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not for here, but it's probably worth looking at changing that; float rounding should really not happen until render time. |
||
result.AvgQueueSize = common.Round(queue, common.DefaultDecimalPlacesCount) | ||
result.AvgAwaitTime = common.Round(wait, common.DefaultDecimalPlacesCount) | ||
if rdIOs > 0 { | ||
result.AvgReadAwaitTime = float64(rdTicks) / float64(rdIOs) | ||
result.AvgReadAwaitTime = common.Round(float64(rdTicks)/float64(rdIOs), common.DefaultDecimalPlacesCount) | ||
} | ||
if wrIOs > 0 { | ||
result.AvgWriteAwaitTime = float64(wrTicks) / float64(wrIOs) | ||
result.AvgWriteAwaitTime = common.Round(float64(wrTicks)/float64(wrIOs), common.DefaultDecimalPlacesCount) | ||
} | ||
result.AvgServiceTime = svct | ||
result.BusyPct = 100.0 * float64(ticks) / deltams | ||
result.AvgServiceTime = common.Round(svct, common.DefaultDecimalPlacesCount) | ||
cmacknz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result.BusyPct = common.Round(100.0*float64(ticks)/deltams, common.DefaultDecimalPlacesCount) | ||
if result.BusyPct > 100.0 { | ||
result.BusyPct = 100.0 | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
math.MaxUint32
is an untyped constant, so this should not be necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, did it as a separate variable more in the hopes of making the logic a little easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it does; the point of having these as untyped was exactly to allow this kind of use.