Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 32-bit rollovers and add rounding in iostat metrics #30679

Merged
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 38 additions & 12 deletions libbeat/metric/system/diskio/diskstat_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,12 @@
package diskio

import (
"math"

"github.com/pkg/errors"
"github.com/shirou/gopsutil/v3/disk"

"github.com/elastic/beats/v7/libbeat/common"
"github.com/elastic/beats/v7/libbeat/metric/system/numcpu"
)

Expand Down Expand Up @@ -52,6 +55,28 @@ func (stat *IOStat) OpenSampling() error {
return stat.curCPU.Get()
}

// a few of the diskio counters are actually 32-bit on the kernel side, which means they can roll over fairly easily.
// Here we try to reconstruct the values by calculating the pre-rollover delta from unt32 max, then adding.
// If you want to get technical, this could be a tad unsafe, as we don't actually have any way of knowing if the word size changes in a future kernel, and we've rolled over at UINT64_MAX

// See https://docs.kernel.org/admin-guide/iostats.html and https://github.com/torvalds/linux/blob/master/block/genhd.c diskstats_show()
func returnOrFixRollover(current, prev uint64) uint64 {
var maxUint32 uint64 = math.MaxUint32 //4_294_967_295 Max value in uint32/unsigned int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math.MaxUint32 is an untyped constant, so this should not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, did it as a separate variable more in the hopes of making the logic a little easier to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it does; the point of having these as untyped was exactly to allow this kind of use.


if current >= prev {
return current - prev
}
// we're at a uint64 if we hit this
if prev > maxUint32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying value is 32 bits, this will never happen?

If it's 64 bits, wouldn't you want to do the same math with math.MaxUint64?

The only time you will need to do any math that depends on the actual integer width is when rollover occurs. Aren't you guaranteed that prev > maxUint32 in that case, and you can switch the max for the calculation to use maxUint64 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, that felt like a paranoid edge case that I should in theory cover, but I wasn't sure if it made sense to actually bother. I mean, if we actually want to "fix" 64-bit rollover, that function should be on all fields, not just a few.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, in that case let's put 32 in the function name so that it's obvious that we are only trying to fix this for 32 bit counters.

returnOrFix32BitRollover or something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest way to address this would be to change the signature to func(curr, prev uint32) uint64, bit widening within the function. This would also have the advantage of statically removing the possibility of this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ That is much better than just changing the name of the function, or hoping this doesn't happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have consequences that concern me; if the kernel changes type for this in the future, the call will still work but with silently corrupted results. What you want to cover that case is non-wrap-around integer arithmetic which would require the full width to be known at call time. This would either be (with ugliness) passing both the 32-bit truncation and the 64-bit original, or with a type conversion helper that signals the unexpected high bits somehow — the calling functions return an error, so this is possible, but with four calls it starts to get unwieldy.

Just thought I should bring these up to avoid sending you down the wrong rabbit hole.

return 0
}

delta := maxUint32 - prev

return delta + current

}

// CalcIOStatistics calculates IO statistics.
func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, error) {
var last disk.IOCountersStat
Expand All @@ -72,13 +97,14 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err
rdIOs := counter.ReadCount - last.ReadCount
rdMerges := counter.MergedReadCount - last.MergedReadCount
rdBytes := counter.ReadBytes - last.ReadBytes
rdTicks := counter.ReadTime - last.ReadTime
rdTicks := returnOrFixRollover(counter.ReadTime, last.ReadTime)
wrIOs := counter.WriteCount - last.WriteCount
wrMerges := counter.MergedWriteCount - last.MergedWriteCount
wrBytes := counter.WriteBytes - last.WriteBytes
wrTicks := counter.WriteTime - last.WriteTime
ticks := counter.IoTime - last.IoTime
aveq := counter.WeightedIO - last.WeightedIO
wrTicks := returnOrFixRollover(counter.WriteTime, last.WriteTime)
ticks := returnOrFixRollover(counter.IoTime, last.IoTime)
aveq := returnOrFixRollover(counter.WeightedIO, last.WeightedIO)

nIOs := rdIOs + wrIOs
nTicks := rdTicks + wrTicks
nBytes := rdBytes + wrBytes
Expand All @@ -94,7 +120,7 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err

queue := float64(aveq) / deltams
perSec := func(x uint64) float64 {
return 1000.0 * float64(x) / deltams
return common.Round(1000.0*float64(x)/deltams, common.DefaultDecimalPlacesCount)
}

result := IOMetric{}
Expand All @@ -104,17 +130,17 @@ func (stat *IOStat) CalcIOStatistics(counter disk.IOCountersStat) (IOMetric, err
result.WriteRequestCountPerSec = perSec(wrIOs)
result.ReadBytesPerSec = perSec(rdBytes)
result.WriteBytesPerSec = perSec(wrBytes)
result.AvgRequestSize = size
result.AvgQueueSize = queue
result.AvgAwaitTime = wait
result.AvgRequestSize = common.Round(size, common.DefaultDecimalPlacesCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a concern, but this article does a nice job of explaining the pitfalls of rounding floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. In this case, that common.Round idiom is everywhere in beats, and it was bugging me to not have it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for here, but it's probably worth looking at changing that; float rounding should really not happen until render time.

result.AvgQueueSize = common.Round(queue, common.DefaultDecimalPlacesCount)
result.AvgAwaitTime = common.Round(wait, common.DefaultDecimalPlacesCount)
if rdIOs > 0 {
result.AvgReadAwaitTime = float64(rdTicks) / float64(rdIOs)
result.AvgReadAwaitTime = common.Round(float64(rdTicks)/float64(rdIOs), common.DefaultDecimalPlacesCount)
}
if wrIOs > 0 {
result.AvgWriteAwaitTime = float64(wrTicks) / float64(wrIOs)
result.AvgWriteAwaitTime = common.Round(float64(wrTicks)/float64(wrIOs), common.DefaultDecimalPlacesCount)
}
result.AvgServiceTime = svct
result.BusyPct = 100.0 * float64(ticks) / deltams
result.AvgServiceTime = common.Round(svct, common.DefaultDecimalPlacesCount)
cmacknz marked this conversation as resolved.
Show resolved Hide resolved
result.BusyPct = common.Round(100.0*float64(ticks)/deltams, common.DefaultDecimalPlacesCount)
if result.BusyPct > 100.0 {
result.BusyPct = 100.0
}
Expand Down
21 changes: 20 additions & 1 deletion libbeat/metric/system/diskio/diskstat_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,13 @@
package diskio

import (
"math"
"testing"

"github.com/shirou/gopsutil/v3/disk"
"github.com/stretchr/testify/assert"

"github.com/elastic/beats/v7/libbeat/common"
sigar "github.com/elastic/gosigar"
)

Expand All @@ -34,6 +36,23 @@ func Test_GetCLKTCK(t *testing.T) {
assert.Equal(t, uint32(100), GetCLKTCK())
}

func Test32BitRollover(t *testing.T) {
var maxUint32 uint64 = math.MaxUint32 // 4_294_967_295

var prev = maxUint32 - 100_000

// A rolled-over value
var current32 uint64 = 1000
// Theoretical un-rolled over value
var current64 = (maxUint32 + current32)

var correct = current64 - prev
assert.Equal(t, returnOrFixRollover(current32, prev), returnOrFixRollover(current64, prev))
assert.Equal(t, correct, returnOrFixRollover(current32, prev))

assert.Equal(t, uint64(0), returnOrFixRollover(current32, current32))
}

func TestDiskIOStat_CalIOStatistics(t *testing.T) {
counter := disk.IOCountersStat{
ReadCount: 13,
Expand All @@ -58,7 +77,7 @@ func TestDiskIOStat_CalIOStatistics(t *testing.T) {
}

expected := IOMetric{
AvgAwaitTime: 24.0 / 22.0,
AvgAwaitTime: common.Round(24.0/22.0, common.DefaultDecimalPlacesCount),
AvgReadAwaitTime: 1.2,
AvgWriteAwaitTime: 1,
}
Expand Down
14 changes: 7 additions & 7 deletions metricbeat/module/linux/iostat/_meta/data.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
"linux": {
"iostat": {
"await": 0,
"busy": 0,
"name": "sr0",
"busy": 0.1503,
"name": "sda",
"queue": {
"avg_size": 0
"avg_size": 0.0005
},
"read": {
"await": 0,
Expand All @@ -24,17 +24,17 @@
}
},
"request": {
"avg_size": 0
"avg_size": 2867.2
},
"service_time": 0,
"service_time": 0.3,
"write": {
"await": 0,
"per_sec": {
"bytes": 0
"bytes": 14365.929
},
"request": {
"merges_per_sec": 0,
"per_sec": 0
"per_sec": 5.0104
}
}
}
Expand Down