Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Add cache overhead accounting #1090

Merged
merged 5 commits into from
Oct 19, 2018
Merged

Conversation

robert-milan
Copy link
Contributor

Prior to this change we were only counting the number of bytes
that the actual data occupied. However, there is a lot of overhead
in our cache system that also needs to be tracked. We were seeing
large differences in the total cache size used and and live heap
allocations.

These changes attempt to track most of the accounting overhead.
The numbers used to track overhead are estimates based on
data type sizes.

  • Added stats for Flat Acccounting, the LRU, and Chunks
  • Added benchmark for LRU allocations

Resolves: #1086
See also: #963

Prior to this change we were only counting the number of bytes
that the actual data occupied. However, there is a lot of overhead
in our cache system that also needs to be tracked. We were seeing
large differences in the total cache size used and and live heap
allocations.

These changes attempt to track most of the accounting overhead.
The numbers used to track overhead are estimates based on
data type sizes.

 * Added stats for Flat Acccounting, the LRU, and Chunks
 * Added benchmark for LRU allocations

Resolves: #1086
See also: #963
@Dieterbe Dieterbe changed the title WIP: Add cache overhead accounting Add cache overhead accounting Oct 12, 2018
@robert-milan robert-milan requested a review from Dieterbe October 12, 2018 14:02
@@ -231,6 +254,11 @@ func (a *FlatAccnt) eventLoop() {
}
}

// // totalUsed returns the sum of cacheSizeUsed, cacheOverheadChunk, cacheOverheadFlat, and cacheOverheadLru
// func (a *FlatAccnt) totalUsed() uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suspect that's not supposed to get merged, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot that was in there. I had some plans for it, but then ended up not using it. I will take care of that.

lenChunks := len(met.chunks)
totalFlat := uint64((lenChunks * flatChunkSize) + flatAccntMetSize)
totalChunk := uint64((lenChunks * ccacheMetChunkSize) + ccacheCacheMetSize)
totalLru := uint64((lenChunks * lruItemSize))
Copy link
Contributor

@replay replay Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double brackets are not necessary here

@@ -251,12 +284,16 @@ func (a *FlatAccnt) delMet(metric schema.AMKey) {
}

cacheSizeUsed.DecUint64(met.total)
cacheOverheadFlat.DecUint64(totalFlat)
cacheOverheadLru.DecUint64(totalLru)
cacheOverheadChunk.DecUint64(totalChunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not important at all: it seems kind of mixed up to me that this function will first delete all the chunks of the metric, then update the related stats about those chunks, and then delete the metric from the metrics map. i think it should either first do all the deletes, and then update the stats, or the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it now, it does look a little weird. I will change it to update the stats and then delete everything.

@@ -33,6 +33,9 @@ var (

cacheSizeMax = stats.NewGauge64("cache.size.max")
cacheSizeUsed = stats.NewGauge64("cache.size.used")
cacheOverheadChunk = stats.NewGauge64("cache.overhead.chunk")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be helpful to have a comment describing what each of those new metrics measure.

lruItemSize = 76

// k: 4 bytes + v: 8 bytes (map[uint32]uint64)
flatChunkSize = 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this naming pattern inconsistent? I think if the name of the variable below is flatAccntMetSize, then the name of flatChunkSize should actually be flatAccntChunkSize, then their prefix would match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be flatAccntMetChunkSize which is long. All of those names are actually pretty long, open to suggestions for abbreviations.
What about something like famChunkSize, famSize, ccmSize, and ccmChunkSize?
I think lruItemSize is already short enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you're suggestions look good, and they're consistent

totalChunk += ccacheMetChunkSize
// this func is called from the event loop so lru will be touched with new EvictTarget
totalLru += lruItemSize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary empty line

reorder stats updates
add comments to stats variables
Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Better let @Dieterbe take a look before merging, as he was involved in the discussions this was based on

@Dieterbe
Copy link
Contributor

you should run

go get -u github.com/Dieterbe/metrics2docs
metrics2docs . > docs/metrics.md

see https://github.com/grafana/metrictank/blob/master/docs/CONTRIBUTING.md

@Dieterbe
Copy link
Contributor

also, add the new metrics to the dashboard

@@ -89,3 +91,28 @@ func TestLRUDelete(t *testing.T) {
t.Fatalf("Expected lru to contain %d items, but have %d / %d", expectedSize, len(lru.items), lru.list.Len())
}
}

func BenchmarkLRUGrowth(b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment describing what's the goal of this would be good

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tweaks needed as mentioned.
i'm not going to dig too deep into this. if it looks good to both robert and mauro, then it's good for me

{
"refId": "F",
"target": "alias(sumSeries(metrictank.stats.$environment.$instance.cache.overhead.lru.gauge64), 'lru overhead')",
"refCount": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can combine these 3 into one query and use aliasByNode() to set the right alias for all of them

@Dieterbe Dieterbe merged commit 8512071 into master Oct 19, 2018
@Dieterbe Dieterbe deleted the add-cache-overhead-accounting branch October 29, 2018 09:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants