Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

[WIP] Add method to cache many chunks at once #940

Closed
wants to merge 18 commits into from
Closed

Conversation

replay
Copy link
Contributor

@replay replay commented Jun 8, 2018

@Dieterbe Looks like quite a major difference, although I still need to verify that the result is really the same and I'm not creating any corruption:

mst@mst-nb1:~/documents/code/go/src/github.com/grafana/metrictank$ go test github.com/grafana/metrictank/mdata/cache  -bench BenchmarkAddingManyChunks
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne-8   	   10000	    816636 ns/op
BenchmarkAddingManyChunksAtOnce-8     	 2000000	       671 ns/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	11.255s
mst@mst-nb1:~/documents/code/go/src/github.com/grafana/metrictank$ 

Will also deduplicate the Add() and AddSlice() methods, currently there's a lot of duplication between these two

@replay replay requested a review from Dieterbe June 8, 2018 19:33
@replay replay force-pushed the cache_many_chunks branch from 1668180 to a7c3453 Compare June 8, 2018 21:45
}

// if the previous chunk is cached, link in both directions
if _, ok := mc.chunks[prev]; ok {
Copy link
Contributor

@Dieterbe Dieterbe Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all but the first case, we know there was a prev
in all but the last case, we know there is a next.
we can do most of the inserts such that we immediately set the right Prev and Next fields

Copy link
Contributor

@Dieterbe Dieterbe Jun 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also pre-allocate the &CCacheChunk in a slice of len = len(itergens) for two benefits:

  1. 1 allocation instead of N (faster Adds)
  2. at read time, we're more likely to have a sequential read (prefetched from RAM)

obviously, needs benchmarking to see if this hunch is correct. Especially number 1 is interesting. I wouldn't spend time on 2 at this point

@@ -69,6 +69,66 @@ func (mc *CCacheMetric) Del(ts uint32) int {
return len(mc.chunks)
}

func (mc *CCacheMetric) AddSlice(prev uint32, itergens []chunk.IterGen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should document its assumptions/requirements, which afaik is two-fold (correct me if i'm wrong)

  • the sequence of chunks should be in ascending timestamp order
  • the sequence should be complete (no gaps)

also I don't like the name AddSlice, slice is just an implementation detail, it doesn't describe the concept well enough.
Perhaps AddSequence, AddWindow or AddRange is better.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jun 9, 2018

great. looks like adding 50k at once is then about 30ms. acceptable.
and we have room to optimize further (see comment) and also we don't always have to regenerate the keys (both in the Range as in the one by one case, if the new data comes after the most recent value in the chunk cache)

mc.chunks[prev].Next = ts
mc.chunks[ts].Prev = prev
}
prev = ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is buggy, because higher up we continue if a chunk exists, but we won't update the prev value, and the next chunk we'll need to insert wil set up the links so that pre-existing chunks will be skipped.

Dieterbe added 6 commits June 12, 2018 14:26
BEFORE:

~/g/s/g/g/m/m/cache ❯❯❯ taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	   10000	   1225701 ns/op	   21688 B/op	       5 allocs/op
BenchmarkAddingManyChunksAtOnce   	 1000000	      1146 ns/op	     118 B/op	       1 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	14.713s

AFTER:

~/g/s/g/g/m/m/cache ❯❯❯ taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	   10000	   1226088 ns/op	   21688 B/op	       5 allocs/op
BenchmarkAddingManyChunksAtOnce   	 1000000	      1043 ns/op	     118 B/op	       1 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	14.620s
taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne    1000000        1057 ns/op       145
B/op         3 allocs/op
BenchmarkAddingManyChunksAtOnce      5000000         448 ns/op       123
B/op         1 allocs/op
PASS
ok    github.com/grafana/metrictank/mdata/cache 15.433s
@Dieterbe Dieterbe force-pushed the cache_many_chunks branch from 5dc80e5 to 387c401 Compare June 12, 2018 12:54
Dieterbe added 4 commits June 12, 2018 15:03
taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	 1000000	      1054 ns/op	     145 B/op	       3 allocs/op
BenchmarkAddingManyChunksAtOnce   	 5000000	       427 ns/op	     123 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	12.755s
@Dieterbe Dieterbe force-pushed the cache_many_chunks branch from da3a59c to a675396 Compare June 12, 2018 15:20
it, err := itgen.Get()
if err != nil {
// TODO(replay) figure out what to do if one piece is corrupt
tracing.Failure(span)
tracing.Errorf(span, "itergen: error getting iter from store slice %+v", err)
if i > 0 {
// add all the iterators that are in good shape
s.Cache.AddRange(ctx.AMKey, prevts, storeIterGens[:i])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a nice idea, to still add those that were not corrupted

}

for _, itergen := range itergens {
c.accnt.AddChunk(metric, itergen.Ts, itergen.Size())
Copy link
Contributor Author

@replay replay Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dieterbe when you looked at the cpu profiles, did this method ever show up? If so, we could batch-add those as well. The benefit is probably going to be smaller because this function just directly pushes the data into a channel. We could also push a whole batch into the channel instead of one-by-one and then optimize the accounting to handle batches, but i'm not sure if there is any benefit to that because in the request-handling thread this will only result in less function calls and less pushes into that channel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was wondering the same thing. it would definitely help with Failed to submit event to accounting, channel was blocked. this is a different issue, but we it shouldn't be possible to hit that situation in normal scenarios, so from that perspective we probably want to batch-add, and also batch other things that could overload the channel (like deletes perhaps?)

}

// add chunk if we don't have it yet (most likely)
if _, ok := mc.chunks[ts]; !ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the exceptional case when this is ok == true then mc.chunks[ts].Next should still get set to itergens[1].Ts


// handle the 2nd until the last-but-one
for i := 1; i < len(itergens)-1; i++ {
itergen := itergens[i]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to allocate a new variable for itergen and ts in this scope? i think it should be fine to just reuse the already allocated ones

ts = itergen.Ts

// if nextTs() can't figure out the end date it returns ts
next := mc.nextTsCore(itergen, ts, prev, 0)
Copy link
Contributor Author

@replay replay Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the chunk at ts is already cached (the ok on line 165 is true), then there is no need to figure out what next should be. So the block 152-162 could be moved down into that if block

Dieterbe added 5 commits June 13, 2018 10:23
regenerateKeys is more expensive because it allocates,
iterates over the map and typically has a more random input
for sort.Sort

Note: BenchmarkAddingManyChunksOneByOne is a bit slower because
previously it was buggy (we didn't always add the ts)

taskset --cpu-list 5 go test -bench  . -benchmem                                                                                                                     ⏎
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	 1000000	      1114 ns/op	     145 B/op	       3 allocs/op
BenchmarkAddingManyChunksAtOnce   	 5000000	       427 ns/op	     123 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	12.856s
it's a better, more consistent name
(no perf change)

taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	 1000000	      1114 ns/op	     145 B/op	       3 allocs/op
BenchmarkAddingManyChunksAtOnce   	 5000000	       427 ns/op	     123 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	12.853s
taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	 1000000	      1112 ns/op	     145 B/op	       3 allocs/op
BenchmarkAddingManyChunksAtOnce   	 5000000	       407 ns/op	     123 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	12.700s
taskset --cpu-list 5 go test -bench  . -benchmem
goos: linux
goarch: amd64
pkg: github.com/grafana/metrictank/mdata/cache
BenchmarkAddingManyChunksOneByOne 	 1000000	      1063 ns/op	     145 B/op	       3 allocs/op
BenchmarkAddingManyChunksAtOnce   	 5000000	       408 ns/op	     123 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/metrictank/mdata/cache	12.671s
Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm rewriting the tests, adding a bunch more, and adding more benchmarks to cover more scenarios (e.g. adding in desc order, adding data globally desc order, but in batches of asc ordered), then i'm rewriting the history so that we can leverage these additional tests and benchmarks as various (performance and functional) changes are introduced.
I'm going to open a new PR for this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants