-
Notifications
You must be signed in to change notification settings - Fork 107
[WIP] Add method to cache many chunks at once #940
Conversation
mdata/cache/ccache_metric.go
Outdated
} | ||
|
||
// if the previous chunk is cached, link in both directions | ||
if _, ok := mc.chunks[prev]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in all but the first case, we know there was a prev
in all but the last case, we know there is a next.
we can do most of the inserts such that we immediately set the right Prev and Next fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also pre-allocate the &CCacheChunk
in a slice of len = len(itergens) for two benefits:
- 1 allocation instead of N (faster Adds)
- at read time, we're more likely to have a sequential read (prefetched from RAM)
obviously, needs benchmarking to see if this hunch is correct. Especially number 1 is interesting. I wouldn't spend time on 2 at this point
mdata/cache/ccache_metric.go
Outdated
@@ -69,6 +69,66 @@ func (mc *CCacheMetric) Del(ts uint32) int { | |||
return len(mc.chunks) | |||
} | |||
|
|||
func (mc *CCacheMetric) AddSlice(prev uint32, itergens []chunk.IterGen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function should document its assumptions/requirements, which afaik is two-fold (correct me if i'm wrong)
- the sequence of chunks should be in ascending timestamp order
- the sequence should be complete (no gaps)
also I don't like the name AddSlice, slice is just an implementation detail, it doesn't describe the concept well enough.
Perhaps AddSequence, AddWindow or AddRange is better.
great. looks like adding 50k at once is then about 30ms. acceptable. |
mdata/cache/ccache_metric.go
Outdated
mc.chunks[prev].Next = ts | ||
mc.chunks[ts].Prev = prev | ||
} | ||
prev = ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is buggy, because higher up we continue if a chunk exists, but we won't update the prev value, and the next chunk we'll need to insert wil set up the links so that pre-existing chunks will be skipped.
BEFORE: ~/g/s/g/g/m/m/cache ❯❯❯ taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 10000 1225701 ns/op 21688 B/op 5 allocs/op BenchmarkAddingManyChunksAtOnce 1000000 1146 ns/op 118 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 14.713s AFTER: ~/g/s/g/g/m/m/cache ❯❯❯ taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 10000 1226088 ns/op 21688 B/op 5 allocs/op BenchmarkAddingManyChunksAtOnce 1000000 1043 ns/op 118 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 14.620s
taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1057 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 448 ns/op 123 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 15.433s
5dc80e5
to
387c401
Compare
taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1054 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 427 ns/op 123 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 12.755s
should have been part of #830
da3a59c
to
a675396
Compare
it, err := itgen.Get() | ||
if err != nil { | ||
// TODO(replay) figure out what to do if one piece is corrupt | ||
tracing.Failure(span) | ||
tracing.Errorf(span, "itergen: error getting iter from store slice %+v", err) | ||
if i > 0 { | ||
// add all the iterators that are in good shape | ||
s.Cache.AddRange(ctx.AMKey, prevts, storeIterGens[:i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a nice idea, to still add those that were not corrupted
} | ||
|
||
for _, itergen := range itergens { | ||
c.accnt.AddChunk(metric, itergen.Ts, itergen.Size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dieterbe when you looked at the cpu profiles, did this method ever show up? If so, we could batch-add those as well. The benefit is probably going to be smaller because this function just directly pushes the data into a channel. We could also push a whole batch into the channel instead of one-by-one and then optimize the accounting to handle batches, but i'm not sure if there is any benefit to that because in the request-handling thread this will only result in less function calls and less pushes into that channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was wondering the same thing. it would definitely help with Failed to submit event to accounting, channel was blocked
. this is a different issue, but we it shouldn't be possible to hit that situation in normal scenarios, so from that perspective we probably want to batch-add, and also batch other things that could overload the channel (like deletes perhaps?)
mdata/cache/ccache_metric.go
Outdated
} | ||
|
||
// add chunk if we don't have it yet (most likely) | ||
if _, ok := mc.chunks[ts]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the exceptional case when this is ok == true
then mc.chunks[ts].Next
should still get set to itergens[1].Ts
mdata/cache/ccache_metric.go
Outdated
|
||
// handle the 2nd until the last-but-one | ||
for i := 1; i < len(itergens)-1; i++ { | ||
itergen := itergens[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a need to allocate a new variable for itergen
and ts
in this scope? i think it should be fine to just reuse the already allocated ones
mdata/cache/ccache_metric.go
Outdated
ts = itergen.Ts | ||
|
||
// if nextTs() can't figure out the end date it returns ts | ||
next := mc.nextTsCore(itergen, ts, prev, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the chunk at ts
is already cached (the ok
on line 165
is true
), then there is no need to figure out what next
should be. So the block 152-162
could be moved down into that if
block
regenerateKeys is more expensive because it allocates, iterates over the map and typically has a more random input for sort.Sort Note: BenchmarkAddingManyChunksOneByOne is a bit slower because previously it was buggy (we didn't always add the ts) taskset --cpu-list 5 go test -bench . -benchmem ⏎ goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1114 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 427 ns/op 123 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 12.856s
it's a better, more consistent name
(no perf change) taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1114 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 427 ns/op 123 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 12.853s
taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1112 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 407 ns/op 123 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 12.700s
taskset --cpu-list 5 go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddingManyChunksOneByOne 1000000 1063 ns/op 145 B/op 3 allocs/op BenchmarkAddingManyChunksAtOnce 5000000 408 ns/op 123 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 12.671s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm rewriting the tests, adding a bunch more, and adding more benchmarks to cover more scenarios (e.g. adding in desc order, adding data globally desc order, but in batches of asc ordered), then i'm rewriting the history so that we can leverage these additional tests and benchmarks as various (performance and functional) changes are introduced.
I'm going to open a new PR for this
@Dieterbe Looks like quite a major difference, although I still need to verify that the result is really the same and I'm not creating any corruption:
Will also deduplicate the
Add()
andAddSlice()
methods, currently there's a lot of duplication between these two