-
Notifications
You must be signed in to change notification settings - Fork 107
chunk cache perf fixes: AddRange + batched accounting #943
Conversation
@replay in my testing so far it seems to works well, is there any testing you add to feel more confident about it? I think now that the code is simplified it should be easier to determine from a review whether it looks safe or not. |
Now also with the batched accounting, test results are in #931 (comment) |
c.accnt.HitChunk(metric, hit.Ts) | ||
} | ||
c.accnt.HitChunks(metric, res.Start) | ||
c.accnt.HitChunks(metric, res.End) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in the majority of cases len(res.End) == 0
, so we could save a few function calls by only doing this if len(res.End) > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems cleaner to put that optimization in the HitChunks implementation. will do that
mdata/cache/ccache_metric.go
Outdated
|
||
return | ||
} | ||
|
||
func (mc *CCacheMetric) addKey(ts uint32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function has only one call-site, so it might as well be integrated there to save a function call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can argue both ways on this. function is small, compiler would probably inline it, and overhead is minimal. it's kind of nice to keep it in separate function to keep cyclomatic complexity low.
OTOH it arguably simplifies the code a bit, so i'm moving it as you suggest.
// if chunk has a next chunk, then that's the ts we need | ||
return chunk.Next | ||
// if chunk has a next chunk, then that's the ts we need | ||
if next != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the chunk has a .Next
ts and is also span aware, but itgen.Ts + itgen.Span != itgen.Next
(a corruption), shouldn't we rather prioritize the next
ts? But then again, if the chunk has a .Next
attribute, do we even need to call this method? I think if nextTs()
is called and chunk.Next > 0
then there is no need to call nextTsCore()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the chunk has a .Next ts and is also span aware, but itgen.Ts + itgen.Span != itgen.Next (a corruption), shouldn't we rather prioritize the next ts?
i don't know. i just kept the logic you wrote. it seems to be one of those should-never-happen-anyway-so-not-really-important cases. in such edge case, .Next should always be set based on a real chunk that we have, whereas itgen.Ts + itgen.Span
may not be a ts of a chunk that we have, so I think your argument makes sense, but I rather not touch that in this PR because this PR needs to go out and be deployed.
to be clear, nextTsCore() kept all the same logic that nextTs had, it just allows the logic to be called on the "core attributes" without needing to have a chunk already in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair enough
mdata/cache/ccache_metric.go
Outdated
// nextTsCore returns the ts of the next chunk, given a chunks key properties | ||
// (to the extent we know them). It guesses if necessary. | ||
// assumes we already have at least a read lock | ||
func (mc *CCacheMetric) nextTsCore(itgen chunk.IterGen, ts, prev, next uint32) uint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict there is no case where itgen.Ts != ts
, so those two arguments are redundant
@replay PTAL |
taskset --cpu-list 5 go test -bench BenchmarkAdd . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 10000 1223930 ns/op 21684 B/op 5 allocs/op BenchmarkAddDesc1 10000 1449968 ns/op 21720 B/op 7 allocs/op BenchmarkAddDesc4 10000 1450317 ns/op 21729 B/op 9 allocs/op BenchmarkAddDesc64 9984 1446361 ns/op 21701 B/op 9 allocs/op BenchmarkAddRangeAsc 1000000 1124 ns/op 118 B/op 1 allocs/op BenchmarkAddRangeDesc4 10000 362446 ns/op 5503 B/op 2 allocs/op BenchmarkAddRangeDesc64 99968 278462 ns/op 3276 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 90.241s
taskset --cpu-list 5 go test -bench BenchmarkAdd . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 10000 1223221 ns/op 21684 B/op 5 allocs/op BenchmarkAddDesc1 10000 1451440 ns/op 21720 B/op 7 allocs/op BenchmarkAddDesc4 10000 1458588 ns/op 21729 B/op 9 allocs/op BenchmarkAddDesc64 9984 1454018 ns/op 21701 B/op 9 allocs/op BenchmarkAddRangeAsc 1000000 1066 ns/op 118 B/op 1 allocs/op BenchmarkAddRangeDesc4 10000 365530 ns/op 5503 B/op 2 allocs/op BenchmarkAddRangeDesc64 99968 279275 ns/op 3276 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 90.490s benchcmp pre.txt post.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1223930 1223221 -0.06% BenchmarkAddDesc1 1449968 1451440 +0.10% BenchmarkAddDesc4 1450317 1458588 +0.57% BenchmarkAddDesc64 1446361 1454018 +0.53% BenchmarkAddRangeAsc 1124 1066 -5.16% BenchmarkAddRangeDesc4 362446 365530 +0.85% BenchmarkAddRangeDesc64 278462 279275 +0.29% benchmark old allocs new allocs delta BenchmarkAddAsc 5 5 +0.00% BenchmarkAddDesc1 7 7 +0.00% BenchmarkAddDesc4 9 9 +0.00% BenchmarkAddDesc64 9 9 +0.00% BenchmarkAddRangeAsc 1 1 +0.00% BenchmarkAddRangeDesc4 2 2 +0.00% BenchmarkAddRangeDesc64 1 1 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 21684 21684 +0.00% BenchmarkAddDesc1 21720 21720 +0.00% BenchmarkAddDesc4 21729 21729 +0.00% BenchmarkAddDesc64 21701 21701 +0.00% BenchmarkAddRangeAsc 118 118 +0.00% BenchmarkAddRangeDesc4 5503 5503 +0.00% BenchmarkAddRangeDesc64 3276 3276 +0.00%
taskset --cpu-list 5 go test -bench BenchmarkAdd . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1229 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 1454771 ns/op 21720 B/op 7 allocs/op BenchmarkAddDesc4 10000 1459969 ns/op 21729 B/op 9 allocs/op BenchmarkAddDesc64 9984 1453576 ns/op 21700 B/op 9 allocs/op BenchmarkAddRangeAsc 2000000 945 ns/op 137 B/op 1 allocs/op BenchmarkAddRangeDesc4 10000 363506 ns/op 5502 B/op 2 allocs/op BenchmarkAddRangeDesc64 99968 278120 ns/op 3276 B/op 1 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 84.736s benchcmp post.txt post2.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1223221 1229 -99.90% BenchmarkAddDesc1 1451440 1454771 +0.23% BenchmarkAddDesc4 1458588 1459969 +0.09% BenchmarkAddDesc64 1454018 1453576 -0.03% BenchmarkAddRangeAsc 1066 945 -11.35% BenchmarkAddRangeDesc4 365530 363506 -0.55% BenchmarkAddRangeDesc64 279275 278120 -0.41% benchmark old allocs new allocs delta BenchmarkAddAsc 5 3 -40.00% BenchmarkAddDesc1 7 7 +0.00% BenchmarkAddDesc4 9 9 +0.00% BenchmarkAddDesc64 9 9 +0.00% BenchmarkAddRangeAsc 1 1 +0.00% BenchmarkAddRangeDesc4 2 2 +0.00% BenchmarkAddRangeDesc64 1 1 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 21684 145 -99.33% BenchmarkAddDesc1 21720 21720 +0.00% BenchmarkAddDesc4 21729 21729 +0.00% BenchmarkAddDesc64 21701 21700 -0.00% BenchmarkAddRangeAsc 118 137 +16.10% BenchmarkAddRangeDesc4 5503 5502 -0.02% BenchmarkAddRangeDesc64 3276 3276 +0.00%
taskset --cpu-list 5 go test -bench Benchmark . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1053 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 1456538 ns/op 21720 B/op 7 allocs/op BenchmarkAddDesc4 10000 1452870 ns/op 21729 B/op 9 allocs/op BenchmarkAddDesc64 9984 1449829 ns/op 21700 B/op 9 allocs/op BenchmarkAddRangeAsc 3000000 366 ns/op 117 B/op 0 allocs/op BenchmarkAddRangeDesc4 10000 363554 ns/op 5503 B/op 1 allocs/op BenchmarkAddRangeDesc64 99968 277954 ns/op 3276 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 84.321s benchcmp post2.txt post3.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1229 1053 -14.32% BenchmarkAddDesc1 1454771 1456538 +0.12% BenchmarkAddDesc4 1459969 1452870 -0.49% BenchmarkAddDesc64 1453576 1449829 -0.26% BenchmarkAddRangeAsc 945 366 -61.27% BenchmarkAddRangeDesc4 363506 363554 +0.01% BenchmarkAddRangeDesc64 278120 277954 -0.06% benchmark old allocs new allocs delta BenchmarkAddAsc 3 3 +0.00% BenchmarkAddDesc1 7 7 +0.00% BenchmarkAddDesc4 9 9 +0.00% BenchmarkAddDesc64 9 9 +0.00% BenchmarkAddRangeAsc 1 0 -100.00% BenchmarkAddRangeDesc4 2 1 -50.00% BenchmarkAddRangeDesc64 1 0 -100.00% benchmark old bytes new bytes delta BenchmarkAddAsc 145 145 +0.00% BenchmarkAddDesc1 21720 21720 +0.00% BenchmarkAddDesc4 21729 21729 +0.00% BenchmarkAddDesc64 21700 21700 +0.00% BenchmarkAddRangeAsc 137 117 -14.60% BenchmarkAddRangeDesc4 5502 5503 +0.02% BenchmarkAddRangeDesc64 3276 3276 +0.00%
should have been part of #830
regenerateKeys is more expensive because it allocates, iterates over the map and typically has a more random input for sort.Sort Note: BenchmarkAddAsc is a bit slower because previously it was buggy (we didn't always add the ts) taskset --cpu-list 5 go test -bench Benchmark . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1119 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 706575 ns/op 191 B/op 6 allocs/op BenchmarkAddDesc4 10000 711135 ns/op 199 B/op 8 allocs/op BenchmarkAddDesc64 9984 705479 ns/op 202 B/op 8 allocs/op BenchmarkAddRangeAsc 5000000 427 ns/op 123 B/op 0 allocs/op BenchmarkAddRangeDesc4 10000 196109 ns/op 131 B/op 1 allocs/op BenchmarkAddRangeDesc64 99968 158026 ns/op 115 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 52.470s benchcmp post3.txt post4.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1053 1119 +6.27% BenchmarkAddDesc1 1456538 706575 -51.49% BenchmarkAddDesc4 1452870 711135 -51.05% BenchmarkAddDesc64 1449829 705479 -51.34% BenchmarkAddRangeAsc 366 427 +16.67% BenchmarkAddRangeDesc4 363554 196109 -46.06% BenchmarkAddRangeDesc64 277954 158026 -43.15% benchmark old allocs new allocs delta BenchmarkAddAsc 3 3 +0.00% BenchmarkAddDesc1 7 6 -14.29% BenchmarkAddDesc4 9 8 -11.11% BenchmarkAddDesc64 9 8 -11.11% BenchmarkAddRangeAsc 0 0 +0.00% BenchmarkAddRangeDesc4 1 1 +0.00% BenchmarkAddRangeDesc64 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 145 145 +0.00% BenchmarkAddDesc1 21720 191 -99.12% BenchmarkAddDesc4 21729 199 -99.08% BenchmarkAddDesc64 21700 202 -99.07% BenchmarkAddRangeAsc 117 123 +5.13% BenchmarkAddRangeDesc4 5503 131 -97.62% BenchmarkAddRangeDesc64 3276 115 -96.49%
it's a better, more consistent name
(no perf change) taskset --cpu-list 5 go test -bench Benchmark . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1117 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 709200 ns/op 191 B/op 6 allocs/op BenchmarkAddDesc4 10000 710500 ns/op 199 B/op 8 allocs/op BenchmarkAddDesc64 9984 706415 ns/op 202 B/op 8 allocs/op BenchmarkAddRangeAsc 5000000 427 ns/op 123 B/op 0 allocs/op BenchmarkAddRangeDesc4 10000 196601 ns/op 132 B/op 1 allocs/op BenchmarkAddRangeDesc64 99968 157863 ns/op 115 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 52.467s benchcmp post4.txt post5.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1119 1117 -0.18% BenchmarkAddDesc1 706575 709200 +0.37% BenchmarkAddDesc4 711135 710500 -0.09% BenchmarkAddDesc64 705479 706415 +0.13% BenchmarkAddRangeAsc 427 427 +0.00% BenchmarkAddRangeDesc4 196109 196601 +0.25% BenchmarkAddRangeDesc64 158026 157863 -0.10% benchmark old allocs new allocs delta BenchmarkAddAsc 3 3 +0.00% BenchmarkAddDesc1 6 6 +0.00% BenchmarkAddDesc4 8 8 +0.00% BenchmarkAddDesc64 8 8 +0.00% BenchmarkAddRangeAsc 0 0 +0.00% BenchmarkAddRangeDesc4 1 1 +0.00% BenchmarkAddRangeDesc64 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 145 145 +0.00% BenchmarkAddDesc1 191 191 +0.00% BenchmarkAddDesc4 199 199 +0.00% BenchmarkAddDesc64 202 202 +0.00% BenchmarkAddRangeAsc 123 123 +0.00% BenchmarkAddRangeDesc4 131 132 +0.76% BenchmarkAddRangeDesc64 115 115 +0.00%
taskset --cpu-list 5 go test -bench Benchmark . -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1105 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 709352 ns/op 191 B/op 6 allocs/op BenchmarkAddDesc4 10000 711287 ns/op 200 B/op 8 allocs/op BenchmarkAddDesc64 9984 707332 ns/op 202 B/op 8 allocs/op BenchmarkAddRangeAsc 5000000 414 ns/op 123 B/op 0 allocs/op BenchmarkAddRangeDesc4 10000 196338 ns/op 131 B/op 1 allocs/op BenchmarkAddRangeDesc64 99968 157720 ns/op 115 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 52.406s benchcmp post5.txt post6.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1117 1105 -1.07% BenchmarkAddDesc1 709200 709352 +0.02% BenchmarkAddDesc4 710500 711287 +0.11% BenchmarkAddDesc64 706415 707332 +0.13% BenchmarkAddRangeAsc 427 414 -3.04% BenchmarkAddRangeDesc4 196601 196338 -0.13% BenchmarkAddRangeDesc64 157863 157720 -0.09% benchmark old allocs new allocs delta BenchmarkAddAsc 3 3 +0.00% BenchmarkAddDesc1 6 6 +0.00% BenchmarkAddDesc4 8 8 +0.00% BenchmarkAddDesc64 8 8 +0.00% BenchmarkAddRangeAsc 0 0 +0.00% BenchmarkAddRangeDesc4 1 1 +0.00% BenchmarkAddRangeDesc64 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 145 145 +0.00% BenchmarkAddDesc1 191 191 +0.00% BenchmarkAddDesc4 199 200 +0.50% BenchmarkAddDesc64 202 202 +0.00% BenchmarkAddRangeAsc 123 123 +0.00% BenchmarkAddRangeDesc4 132 131 -0.76% BenchmarkAddRangeDesc64 115 115 +0.00%
taskset --cpu-list 5 go test -bench Benchmark . -benchmem | tee post7.txt goos: linux goarch: amd64 pkg: github.com/grafana/metrictank/mdata/cache BenchmarkAddAsc 1000000 1111 ns/op 145 B/op 3 allocs/op BenchmarkAddDesc1 10000 707682 ns/op 191 B/op 6 allocs/op BenchmarkAddDesc4 10000 708973 ns/op 199 B/op 8 allocs/op BenchmarkAddDesc64 9984 705477 ns/op 202 B/op 8 allocs/op BenchmarkAddRangeAsc 5000000 406 ns/op 123 B/op 0 allocs/op BenchmarkAddRangeDesc4 10000 196523 ns/op 132 B/op 1 allocs/op BenchmarkAddRangeDesc64 99968 157809 ns/op 115 B/op 0 allocs/op PASS ok github.com/grafana/metrictank/mdata/cache 52.220s benchcmp post6.txt post7.txt benchmark old ns/op new ns/op delta BenchmarkAddAsc 1105 1111 +0.54% BenchmarkAddDesc1 709352 707682 -0.24% BenchmarkAddDesc4 711287 708973 -0.33% BenchmarkAddDesc64 707332 705477 -0.26% BenchmarkAddRangeAsc 414 406 -1.93% BenchmarkAddRangeDesc4 196338 196523 +0.09% BenchmarkAddRangeDesc64 157720 157809 +0.06% benchmark old allocs new allocs delta BenchmarkAddAsc 3 3 +0.00% BenchmarkAddDesc1 6 6 +0.00% BenchmarkAddDesc4 8 8 +0.00% BenchmarkAddDesc64 8 8 +0.00% BenchmarkAddRangeAsc 0 0 +0.00% BenchmarkAddRangeDesc4 1 1 +0.00% BenchmarkAddRangeDesc64 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkAddAsc 145 145 +0.00% BenchmarkAddDesc1 191 191 +0.00% BenchmarkAddDesc4 200 199 -0.50% BenchmarkAddDesc64 202 202 +0.00% BenchmarkAddRangeAsc 123 123 +0.00% BenchmarkAddRangeDesc4 131 132 +0.76% BenchmarkAddRangeDesc64 115 115 +0.00%
1) less contention on channel 2) prevent channel overflowing 3) less CPU spent printing `Failed to submit event to accounting, channel was blocked` For the record, When I patch the code like so: diff --git a/mdata/cache/accnt/flat_accnt.go b/mdata/cache/accnt/flat_accnt.go index 4097e067..d202960c 100644 --- a/mdata/cache/accnt/flat_accnt.go +++ b/mdata/cache/accnt/flat_accnt.go @@ -1,7 +1,9 @@ package accnt import ( + "fmt" "sort" + "time" "github.com/grafana/metrictank/mdata/chunk" "github.com/raintank/worldping-api/pkg/log" @@ -178,6 +180,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_add_chnks: + pre := time.Now() payload := event.pl.(*AddsPayload) a.addRange(payload.metric, payload.chunks) cacheChunkAdd.Add(len(payload.chunks)) @@ -189,6 +192,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_add_chnks took", time.Since(pre).Nanoseconds()) case evnt_hit_chnk: payload := event.pl.(*HitPayload) a.lru.touch( @@ -198,6 +202,7 @@ func (a *FlatAccnt) eventLoop() { }, ) case evnt_hit_chnks: + pre := time.Now() payload := event.pl.(*HitsPayload) for _, chunk := range payload.chunks { a.lru.touch( @@ -207,6 +212,7 @@ func (a *FlatAccnt) eventLoop() { }, ) } + fmt.Println("CODE evnt_hit_chnks took", time.Since(pre).Nanoseconds()) and run the workload described in #931 (queries for 2y of data across 500 series) I get these durations: awk '/evnt_add_chnks/ {print $4}' code2.txt |goplot hist 1187.00 -> 11160892.40: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 850 22320597.80: ▇▇ 39 (upto 22 ms) 33480303.20: ▇ 17 (upto 33 ms) 44640008.60: ▏ 5 (upto 45 ms) 55799714.00: ▏ 5 (upto 56 ms) 66959419.40: ▏ 1 (upto 67 ms) 78119124.80: ▏ 0 89278830.20: ▏ 0 100438535.60: ▏ 0 111598241.00: ▏ 1 (upto 111 ms) awk '/evnt_hit_chnks/ {print $4}' code2.txt |goplot hist 45.00 -> 22366478.50: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7624 44732912.00: ▏ 65 (upto 45 ms) 67099345.50: ▏ 29 (upto 67 ms) 89465779.00: ▏ 8 (upto 89 ms) 111832212.50: ▏ 1 (upto 112 ms) 134198646.00: ▏ 1 (upto 134 ms) 156565079.50: ▏ 0 178931513.00: ▏ 1 (upto 179 ms) 201297946.50: ▏ 0 223664380.00: ▏ 1 (upto 224 ms)
6dbebae
to
7d8a70a
Compare
rebased on master to fix conflict on ccache.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
replaces #940
going from the old Add implementation to the new AddRange implementation