Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

account for the cap() of byte slices in ccache #963

Closed
wants to merge 2 commits into from

Conversation

replay
Copy link
Contributor

@replay replay commented Jul 19, 2018

We want to know how much memory MT allocates for the cache, so we also need to keep track of the cap() of the data in the cache, instead of only tacking len().

To get a really clear picture of how much memory is used for the chunk cache we'll need to add more metrics. Every time a chunk is added to the cache, there's some overhead for it in the data structure of the chunk cache itself, and also in the accounting for it. I think we should add two more metrics, one is keeping track of the overhead of the chunk cache data structure and another one keeps track of the overhead for the accounting itself.

@replay replay changed the title [WIP] account for the cap() of byte slices in ccache account for the cap() of byte slices in ccache Jul 23, 2018
@replay replay requested review from Dieterbe and woodsaj July 23, 2018 19:54
@replay replay force-pushed the more_cache_metrics branch from e1d94cd to 5dc1a82 Compare August 2, 2018 16:36
@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 6, 2018

We want to know how much memory MT allocates for the cache, so we also need to keep track of the cap() of the data in the cache, instead of only tacking len().

we should either:

  1. measure the "logical" size (the amount of bytes of actual data)
  2. measure the "real" size (size of actual data + cap-len diff overhead + all metadata and tracking datastructures)

we currently do 1.
somewhere in between 1 and 2 (this PR) is also not that useful IMHO
"true 2" is interesting, but complicated to get right, and can be obtained with a memory profile anyway.

I think we can look into optimizing ccache overhead and see if we can deduce some kind of heuristic or formula to estimate "true" memory usage given a number of chunks and metrics cached.
but trying to answer this information need via extra instrumentation code goes too far IMHO, especially because it's so trivial to get a memory profile.

@Dieterbe
Copy link
Contributor

@replay let me know what you think of the above. if you agree, is there anything in this PR i should still look at? because i didn't look in detail at the code yet.

@replay
Copy link
Contributor Author

replay commented Aug 13, 2018

The idea of this PR was simply that we wanted to know if the allocated byte slices that are used to keep the iterators are sometimes bigger than necessary. By comparing the len() stats with the cap() stats we could see that easily, and if there's a difference between the two we would work to optimize that.
What you suggest in 2) would certainly be nice to have, but it would also be a lot more complicated than this, so this is only a small step in that direction which would already serve a purpose.

@Dieterbe
Copy link
Contributor

The idea of this PR was simply that we wanted to know if the allocated byte slices that are used to keep the iterators are sometimes bigger than necessary

knowing as part of a debugging / development process or need to know constantly for running production services? we've made the mistake in the past of adding too much runtime instrumentation for things that we only need when troubleshooting a performance problem, those are well served by just some temporary instrumentation that doesn't need to be merged. (e.g. index duration metrics were removed for this reason)

@replay
Copy link
Contributor Author

replay commented Aug 14, 2018

That's a question which is hard to answer, because if there is an issue where the allocated bytes slices are larger then necessary then we don't know exactly in which situations this would/wouldn't happen. F.e. a Sarama update could probably introduce an issue like that. But I guess for a start it would already be useful to just deploy that anywhere where MT is in production use in order to then be able to compare the cap() and len() stats.
I don't remember that case of the index duration metrics that you mention, was I involved in that?

@Dieterbe
Copy link
Contributor

I don't remember that case of the index duration metrics that you mention, was I involved in that?

I believe I added them originally, then deleted them later when i discovered they were slowing down ingestion. couldn't find the PR right now, but it's out there somewhere.

FWIW, AFAIK the go runtime doubles capacity to accommodate a len up to 1024 or so, then increments in 25% steps from there, but of course that assumes no reusing of previously allocated slices.

Are we observing any concrete problems that we think this patch will help us with, and if so can you explain the actual problem?

Otherwise I believe you can get the answers you need via 'http://localhost:6060/debug/pprof/heap' and then something like

(pprof) top30 CassandraStore
Active filters:
   focus=CassandraStore
Showing nodes accounting for 0.64GB, 25.26% of 2.54GB total
      flat  flat%   sum%        cum   cum%
    0.62GB 24.44% 24.44%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
    0.02GB  0.81% 25.26%     0.02GB  0.81%  github.com/grafana/metrictank/store/cassandra.NewCassandraStore
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeries
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeriesCachedStore
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getSeriesFixed
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getTarget
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/api.(*Server).getTargetsLocal.func1
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/store/cassandra.(*CassandraStore).Search
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/store/cassandra.(*CassandraStore).SearchTable
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.(*Iter).Scan
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.Unmarshal
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.scanColumn
         0     0% 25.26%     0.62GB 24.44%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.unmarshalVarchar
         0     0% 25.26%     0.02GB  0.81%  main.main
         0     0% 25.26%     0.02GB  0.81%  runtime.main

@replay
Copy link
Contributor Author

replay commented Aug 14, 2018

The problem we're observing is that MT appears to be consuming more memory than expected, so one theory that we came up with while you were on vacation was that maybe the byte slices used to store the iterators have a capacity that's larger than necessary. This PR was supposed to make it easy to verify if this is true or not.

@Dieterbe
Copy link
Contributor

we should make the accounting accurate as brought up in #942
for diagnosing runaway RAM usage we can use a heap profile.
adding more of our own instrumentation (which adds its own overhead) for something we rarely or never need is not the right solution.

@Dieterbe Dieterbe closed this Aug 14, 2018
@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 26, 2018

@replay @woodsaj FYI some followup:

I dug a bit in the code, it's not that hard to follow
Iter.Scan -> scanColumn -> Unmarshal -> unmarshalVarchar -> copyBytes.
In copyBytes it allocates a new slice sized just for the size of the field we need, and that is returned directly to us. So it follows that any cap-len difference would be minor. (e.g. i think if you request a slice of len 100 you might get 128 but even that overhead doesn't seem to exist as seen below, or probably that's just an implementation detail that the runtime doesn't expose)

to doublecheck i applied the following patch:

diff --git a/store/cassandra/cassandra.go b/store/cassandra/cassandra.go
index cbe91a1a..f7ad2a19 100644
--- a/store/cassandra/cassandra.go
+++ b/store/cassandra/cassandra.go
@@ -529,6 +529,7 @@ func (c *CassandraStore) SearchTable(ctx context.Context, key schema.AMKey, tabl
                        tracing.Error(span, errChunkTooSmall)
                        return itgens, errChunkTooSmall
                }
+               fmt.Println("scanned chunk len", len(b), "cap", cap(b))
                itgen, err := chunk.NewGen(b, uint32(ts))
                if err != nil {
                        tracing.Failure(span)
diff --git a/vendor/github.com/gocql/gocql/helpers.go b/vendor/github.com/gocql/gocql/helpers.go
index 944517e2..78692d68 100644
--- a/vendor/github.com/gocql/gocql/helpers.go
+++ b/vendor/github.com/gocql/gocql/helpers.go
@@ -324,6 +324,7 @@ func (iter *Iter) MapScan(m map[string]interface{}) bool {
 
 func copyBytes(p []byte) []byte {
        b := make([]byte, len(p))
+       fmt.Println("copyBytes in", len(p), cap(p), "out", len(b), cap(b))
        copy(b, p)
        return b
 }

and when i request data, the output looks like so:

metrictank3_1   | copyBytes in 104 1736 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 106 1620 out 106 106
metrictank3_1   | scanned chunk len 106 cap 106
metrictank3_1   | copyBytes in 104 1502 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 104 1386 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 104 1270 out 104 104
metrictank3_1   | scanned chunk len 104 cap 104
metrictank3_1   | copyBytes in 107 1154 out 107 107
metrictank3_1   | scanned chunk len 107 cap 107
metrictank3_1   | copyBytes in 106 1035 out 106 106
metrictank3_1   | scanned chunk len 106 cap 106
metrictank3_1   | copyBytes in 102 917 out 102 102
metrictank3_1   | scanned chunk len 102 cap 102
metrictank3_1   | copyBytes in 102 803 out 102 102
metrictank3_1   | scanned chunk len 102 cap 102
metrictank3_1   | copyBytes in 104 689 out 104 104

you can see that the slice coming from the framer has a larger capacity, probably because it contains much more stuff because it holds a frame, but the slice we get holding our chunk is sized just right for our chunk.

So :

  1. the "cap-len difference" theory of causing excess memory usage is not correct.
  2. gocql doesn't hold a reference to the chunk slice after Scan returns.

What is interesting however, is to reduce allocations (but not memory in use), is perhaps we can recycle slices. I've brought this up on their list: https://groups.google.com/forum/#!topic/gocql/ng0d9ZyGwSU

@Dieterbe Dieterbe deleted the more_cache_metrics branch September 18, 2018 08:58
robert-milan added a commit that referenced this pull request Oct 10, 2018
Prior to this change we were only counting the number of bytes
that the actual data occupied. However, there is a lot of overhead
in our cache system that also needs to be tracked. We were seeing
large differences in the total cache size used and and live heap
allocations.

These changes attempt to track most of the accounting overhead.
The numbers used to track overhead are estimates based on
data type sizes.

 * Added stats for Flat Acccounting, the LRU, and Chunks
 * Added benchmark for LRU allocations

Resolves: #1086
See also: #963
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants