Point slice pool fixes #958

Dieterbe · 2018-06-29T18:41:48Z

before https://snapshot.raintank.io/dashboard/snapshot/ZGojZ4ApEM77TWkM87LLbb7TcWcKNxsx?orgId=2
after https://snapshot.raintank.io/dashboard/snapshot/WwuhppsQsTwt3JnxmExRFLh52v1I0j85?orgId=2

note significant difference in:

amount of GC runs
allocation rate

note minor (insignificant?) RAM usage at end of the test. (but started with more ?)

test run is basically

initial setup as described in refactor cassandra read queries #931 (launch stack, fill with 2y worth of data)
do a few requests of 5min of MT dashboard to get that ram usage "warmed up"

for i in {1..20}; do
	curl 'http://localhost:6060/render?target=some.id.of.a.metric.1*&format=json&from=-2y' >/dev/null
done
for i in {1..20}; do
	curl 'http://localhost:8081/render?target=some.id.of.a.metric.1*&format=json&from=-2y' >/dev/null
done

DanCech · 2018-06-29T19:11:41Z

api/dataprocessor.go

@@ -113,6 +126,7 @@ func Fix(in []schema.Point, from, to, interval uint32) []schema.Point {
 			o -= 1
 		}
 	}
+	pointSlicePool.Put(in[:0])


This seems backwards to me, I'd have thought that the caller would be responsible for returning in to the pool, since this function has no way of knowing whether the caller intends to continue using it.

Similarly, it seems like the caller should provide out so that it can be explicitly responsible for both getting it from the pool and cleaning it up afterwards.

Finally, is there a reason that we don't put the retry and allocation logic into pointSlicePool?

I'd have thought that the caller would be responsible for returning in to the pool.
Similarly, it seems like the caller should provide out so that it can be explicitly responsible for both getting it from the pool and cleaning it up afterwards.

we can do that. I definitely think the pool interaction for in and out should be consistent (either both in caller, or both within Fix).
so this would make Fix a more pure utility function, which is good. but the downside is pulling out the neededCap := int((last-first)/interval + 1) calculation out of Fix is a bit weird.

Finally, is there a reason that we don't put the retry and allocation logic into pointSlicePool?

this makes sense. sidenote: I've been thinking of having separate size classes, that sort of stuff would all make sense if we make pointSlicePool a more full-featured "object" rather than merely a sync.Pool instance

yeah, that does making moving out outside a little tough

Dieterbe · 2018-06-29T21:21:46Z

in_use space
before:

      flat  flat%   sum%        cum   cum%
   90.69MB 37.29% 37.29%    90.69MB 37.29%  github.com/grafana/metrictank/api.Fix
   29.50MB 12.13% 49.42%    29.50MB 12.13%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
   26.83MB 11.03% 60.45%    27.33MB 11.24%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).AddRange
   21.48MB  8.83% 69.28%    21.48MB  8.83%  github.com/grafana/metrictank/api/models.SeriesByTarget.ForGraphite
   14.31MB  5.88% 75.17%    26.81MB 11.02%  github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch

after:

      flat  flat%   sum%        cum   cum%
   45.73MB 20.01% 20.01%    45.73MB 20.01%  github.com/grafana/metrictank/api.(*Server).itersToPoints
   42.50MB 18.60% 38.61%    42.50MB 18.60%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
   31.86MB 13.94% 52.55%    31.86MB 13.94%  github.com/grafana/metrictank/api.Fix
   20.26MB  8.87% 61.42%    20.26MB  8.87%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).AddRange
   18.50MB  8.10% 69.51%    18.50MB  8.10%  container/list.(*List).insertValue (inline)
   14.31MB  6.26% 75.77%    32.81MB 14.36%  github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch

profiling this shows a 66% reduction in api.Fix but at the expense of itersToPoints allocating more. it's not trivial to compare since the total heap is also different, but based on the other stuff (e.g. gocql, cache AddRange) now being a larger % of mem allocation it confirms we now allocate less space.

BTW itersToPoints allocates here:

   45.73MB    45.73MB    486:				points = append(points, schema.Point{Val: val, Ts: ts})

so it's probably due to it getting a slice from the pool, but it now being smaller than before (because in other places we store smaller slices in the pool?) so this can be addressed with size classes , size hints, or something like that.

robert-milan

LGTM. In the future we will need to revisit the other issues referenced in this PR.

Dieterbe · 2019-06-10T19:22:03Z

the main dangerous thing about this PR is when we release a slice to the pool, we have to be sure there are no references to it anywhere else (e.g. in the cache).
i went through this a couple times back when i wrote this, and now again. and it still looks safe.

DanCech reviewed Jun 29, 2018

View reviewed changes

Dieterbe mentioned this pull request Aug 6, 2018

pointSlicePool issues #962

Closed

Dieterbe mentioned this pull request Sep 17, 2018

[]Point reuse: everywhere and safer #1049

Closed

robert-milan force-pushed the point-slice-pool-fixes branch from aea25d3 to ead0b5b Compare June 10, 2019 10:19

robert-milan approved these changes Jun 10, 2019

View reviewed changes

Dieterbe added 2 commits June 10, 2019 21:20

return to pointSlicePool where we previously forgot

d14c61a

Fix(): use pool to try to prevent allocation

8fde13c

Dieterbe force-pushed the point-slice-pool-fixes branch from ead0b5b to 8fde13c Compare June 10, 2019 19:21

Dieterbe merged commit b18aa05 into master Jun 10, 2019

Dieterbe deleted the point-slice-pool-fixes branch June 10, 2019 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Point slice pool fixes #958

Point slice pool fixes #958

Dieterbe commented Jun 29, 2018

DanCech Jun 29, 2018 •

edited

Loading

Dieterbe Jun 29, 2018

DanCech Jun 29, 2018

Dieterbe commented Jun 29, 2018 •

edited

Loading

robert-milan left a comment

Dieterbe commented Jun 10, 2019

Point slice pool fixes #958

Point slice pool fixes #958

Conversation

Dieterbe commented Jun 29, 2018

DanCech Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

Dieterbe Jun 29, 2018

Choose a reason for hiding this comment

DanCech Jun 29, 2018

Choose a reason for hiding this comment

Dieterbe commented Jun 29, 2018 • edited Loading

robert-milan left a comment

Choose a reason for hiding this comment

Dieterbe commented Jun 10, 2019

DanCech Jun 29, 2018 •

edited

Loading

Dieterbe commented Jun 29, 2018 •

edited

Loading