account for the cap() of byte slices in ccache #963

replay · 2018-07-19T17:58:19Z

We want to know how much memory MT allocates for the cache, so we also need to keep track of the cap() of the data in the cache, instead of only tacking len().

To get a really clear picture of how much memory is used for the chunk cache we'll need to add more metrics. Every time a chunk is added to the cache, there's some overhead for it in the data structure of the chunk cache itself, and also in the accounting for it. I think we should add two more metrics, one is keeping track of the overhead of the chunk cache data structure and another one keeps track of the overhead for the accounting itself.

Dieterbe · 2018-08-06T22:08:13Z

We want to know how much memory MT allocates for the cache, so we also need to keep track of the cap() of the data in the cache, instead of only tacking len().

we should either:

measure the "logical" size (the amount of bytes of actual data)
measure the "real" size (size of actual data + cap-len diff overhead + all metadata and tracking datastructures)

we currently do 1.
somewhere in between 1 and 2 (this PR) is also not that useful IMHO
"true 2" is interesting, but complicated to get right, and can be obtained with a memory profile anyway.

I think we can look into optimizing ccache overhead and see if we can deduce some kind of heuristic or formula to estimate "true" memory usage given a number of chunks and metrics cached.
but trying to answer this information need via extra instrumentation code goes too far IMHO, especially because it's so trivial to get a memory profile.

Dieterbe · 2018-08-10T16:18:57Z

@replay let me know what you think of the above. if you agree, is there anything in this PR i should still look at? because i didn't look in detail at the code yet.

replay · 2018-08-13T14:20:11Z

The idea of this PR was simply that we wanted to know if the allocated byte slices that are used to keep the iterators are sometimes bigger than necessary. By comparing the len() stats with the cap() stats we could see that easily, and if there's a difference between the two we would work to optimize that.
What you suggest in 2) would certainly be nice to have, but it would also be a lot more complicated than this, so this is only a small step in that direction which would already serve a purpose.

Dieterbe · 2018-08-14T08:54:54Z

The idea of this PR was simply that we wanted to know if the allocated byte slices that are used to keep the iterators are sometimes bigger than necessary

knowing as part of a debugging / development process or need to know constantly for running production services? we've made the mistake in the past of adding too much runtime instrumentation for things that we only need when troubleshooting a performance problem, those are well served by just some temporary instrumentation that doesn't need to be merged. (e.g. index duration metrics were removed for this reason)

replay · 2018-08-14T15:38:10Z

That's a question which is hard to answer, because if there is an issue where the allocated bytes slices are larger then necessary then we don't know exactly in which situations this would/wouldn't happen. F.e. a Sarama update could probably introduce an issue like that. But I guess for a start it would already be useful to just deploy that anywhere where MT is in production use in order to then be able to compare the cap() and len() stats.
I don't remember that case of the index duration metrics that you mention, was I involved in that?

Dieterbe · 2018-08-14T15:58:48Z

I don't remember that case of the index duration metrics that you mention, was I involved in that?

I believe I added them originally, then deleted them later when i discovered they were slowing down ingestion. couldn't find the PR right now, but it's out there somewhere.

FWIW, AFAIK the go runtime doubles capacity to accommodate a len up to 1024 or so, then increments in 25% steps from there, but of course that assumes no reusing of previously allocated slices.

Are we observing any concrete problems that we think this patch will help us with, and if so can you explain the actual problem?

Otherwise I believe you can get the answers you need via 'http://localhost:6060/debug/pprof/heap' and then something like

(pprof) top30 CassandraStore
Active filters:
   focus=CassandraStore
Showing nodes accounting for 0.64GB, 25.26% of 2.54GB total
      flat  flat%   sum%        cum   cum%
    0.62GB 24.44% 24.44%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
    0.02GB  0.81% 25.26%     0.02GB  0.81%  github.com/grafana/metrictank/store/cassandra.NewCassandraStore
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeries
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeriesCachedStore
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeriesFixed
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getTarget
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getTargetsLocal.func1
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/store/cassandra.(*CassandraStore).Search
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/store/cassandra.(*CassandraStore).SearchTable
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.(*Iter).Scan
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.Unmarshal
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.scanColumn
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.unmarshalVarchar
         0     0% 25.26%     0.02GB  0.81%  main.main
         0     0% 25.26%     0.02GB  0.81%  runtime.main

replay · 2018-08-14T16:31:32Z

The problem we're observing is that MT appears to be consuming more memory than expected, so one theory that we came up with while you were on vacation was that maybe the byte slices used to store the iterators have a capacity that's larger than necessary. This PR was supposed to make it easy to verify if this is true or not.

Dieterbe · 2018-08-14T17:03:35Z

we should make the accounting accurate as brought up in #942
for diagnosing runaway RAM usage we can use a heap profile.
adding more of our own instrumentation (which adds its own overhead) for something we rarely or never need is not the right solution.

Dieterbe · 2018-08-26T19:53:21Z

@replay @woodsaj FYI some followup:

I dug a bit in the code, it's not that hard to follow
Iter.Scan -> scanColumn -> Unmarshal -> unmarshalVarchar -> copyBytes.
In copyBytes it allocates a new slice sized just for the size of the field we need, and that is returned directly to us. So it follows that any cap-len difference would be minor. (e.g. i think if you request a slice of len 100 you might get 128 but even that overhead doesn't seem to exist as seen below, or probably that's just an implementation detail that the runtime doesn't expose)

to doublecheck i applied the following patch:

diff --git a/store/cassandra/cassandra.go b/store/cassandra/cassandra.go
index cbe91a1a..f7ad2a19 100644
--- a/store/cassandra/cassandra.go
+++ b/store/cassandra/cassandra.go
@@ -529,6 +529,7 @@ func (c *CassandraStore) SearchTable(ctx context.Context, key schema.AMKey, tabl
                        tracing.Error(span, errChunkTooSmall)
                        return itgens, errChunkTooSmall
                }
+               fmt.Println("scanned chunk len", len(b), "cap", cap(b))
                itgen, err := chunk.NewGen(b, uint32(ts))
                if err != nil {
                        tracing.Failure(span)
diff --git a/vendor/github.com/gocql/gocql/helpers.go b/vendor/github.com/gocql/gocql/helpers.go
index 944517e2..78692d68 100644
--- a/vendor/github.com/gocql/gocql/helpers.go
+++ b/vendor/github.com/gocql/gocql/helpers.go
@@ -324,6 +324,7 @@ func (iter *Iter) MapScan(m map[string]interface{}) bool {
 
 func copyBytes(p []byte) []byte {
        b := make([]byte, len(p))
+       fmt.Println("copyBytes in", len(p), cap(p), "out", len(b), cap(b))
        copy(b, p)
        return b
 }

and when i request data, the output looks like so:

metrictank3_1   | copyBytes in 104 1736 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 106 1620 out 106 106
metrictank3_1   | scanned chunk len 106 cap 106
metrictank3_1   | copyBytes in 104 1502 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 104 1386 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 104 1270 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 107 1154 out 107 107
metrictank3_1   | scanned chunk len 107 cap 107
metrictank3_1   | copyBytes in 106 1035 out 106 106
metrictank3_1   | scanned chunk len 106 cap 106
metrictank3_1   | copyBytes in 102 917 out 102 102
metrictank3_1   | scanned chunk len 102 cap 102
metrictank3_1   | copyBytes in 102 803 out 102 102
metrictank3_1   | scanned chunk len 102 cap 102
metrictank3_1   | copyBytes in 104 689 out 104 104

you can see that the slice coming from the framer has a larger capacity, probably because it contains much more stuff because it holds a frame, but the slice we get holding our chunk is sized just right for our chunk.

So :

the "cap-len difference" theory of causing excess memory usage is not correct.
gocql doesn't hold a reference to the chunk slice after Scan returns.

What is interesting however, is to reduce allocations (but not memory in use), is perhaps we can recycle slices. I've brought this up on their list: https://groups.google.com/forum/#!topic/gocql/ng0d9ZyGwSU

Prior to this change we were only counting the number of bytes that the actual data occupied. However, there is a lot of overhead in our cache system that also needs to be tracked. We were seeing large differences in the total cache size used and and live heap allocations. These changes attempt to track most of the accounting overhead. The numbers used to track overhead are estimates based on data type sizes. * Added stats for Flat Acccounting, the LRU, and Chunks * Added benchmark for LRU allocations Resolves: #1086 See also: #963

replay changed the title ~~[WIP] account for the cap() of byte slices in ccache~~ account for the cap() of byte slices in ccache Jul 23, 2018

replay requested review from Dieterbe and woodsaj July 23, 2018 19:54

account for the cap() of byte slices in ccache

5dc1a82

replay force-pushed the more_cache_metrics branch from e1d94cd to 5dc1a82 Compare August 2, 2018 16:36

fix tests

ef29ec2

Dieterbe closed this Aug 14, 2018

Dieterbe deleted the more_cache_metrics branch September 18, 2018 08:58

Dieterbe mentioned this pull request Sep 19, 2018

excessive / runaway heap usage attributed to github.com/gocql/gocql.copyBytes #1057

Closed

robert-milan mentioned this pull request Oct 8, 2018

inaccurate tracking of total memory used by chunk cache #1086

Closed

robert-milan mentioned this pull request Oct 10, 2018

Add cache overhead accounting #1090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

account for the cap() of byte slices in ccache #963

account for the cap() of byte slices in ccache #963

replay commented Jul 19, 2018 •

edited

Loading

Dieterbe commented Aug 6, 2018 •

edited

Loading

Dieterbe commented Aug 10, 2018

replay commented Aug 13, 2018

Dieterbe commented Aug 14, 2018

replay commented Aug 14, 2018

Dieterbe commented Aug 14, 2018

replay commented Aug 14, 2018

Dieterbe commented Aug 14, 2018

Dieterbe commented Aug 26, 2018 •

edited

Loading

account for the cap() of byte slices in ccache #963

account for the cap() of byte slices in ccache #963

Conversation

replay commented Jul 19, 2018 • edited Loading

Dieterbe commented Aug 6, 2018 • edited Loading

Dieterbe commented Aug 10, 2018

replay commented Aug 13, 2018

Dieterbe commented Aug 14, 2018

replay commented Aug 14, 2018

Dieterbe commented Aug 14, 2018

replay commented Aug 14, 2018

Dieterbe commented Aug 14, 2018

Dieterbe commented Aug 26, 2018 • edited Loading

replay commented Jul 19, 2018 •

edited

Loading

Dieterbe commented Aug 6, 2018 •

edited

Loading

Dieterbe commented Aug 26, 2018 •

edited

Loading