Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Point slice pool fixes #958

Merged
merged 2 commits into from
Jun 10, 2019
Merged

Point slice pool fixes #958

merged 2 commits into from
Jun 10, 2019

Conversation

Dieterbe
Copy link
Contributor

before https://snapshot.raintank.io/dashboard/snapshot/ZGojZ4ApEM77TWkM87LLbb7TcWcKNxsx?orgId=2
after https://snapshot.raintank.io/dashboard/snapshot/WwuhppsQsTwt3JnxmExRFLh52v1I0j85?orgId=2

note significant difference in:

  • amount of GC runs
  • allocation rate

note minor (insignificant?) RAM usage at end of the test. (but started with more ?)

test run is basically

  1. initial setup as described in refactor cassandra read queries #931 (launch stack, fill with 2y worth of data)
  2. do a few requests of 5min of MT dashboard to get that ram usage "warmed up"
for i in {1..20}; do
	curl 'http://localhost:6060/render?target=some.id.of.a.metric.1*&format=json&from=-2y' >/dev/null
done
for i in {1..20}; do
	curl 'http://localhost:8081/render?target=some.id.of.a.metric.1*&format=json&from=-2y' >/dev/null
done

@@ -113,6 +126,7 @@ func Fix(in []schema.Point, from, to, interval uint32) []schema.Point {
o -= 1
}
}
pointSlicePool.Put(in[:0])
Copy link
Contributor

@DanCech DanCech Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems backwards to me, I'd have thought that the caller would be responsible for returning in to the pool, since this function has no way of knowing whether the caller intends to continue using it.

Similarly, it seems like the caller should provide out so that it can be explicitly responsible for both getting it from the pool and cleaning it up afterwards.

Finally, is there a reason that we don't put the retry and allocation logic into pointSlicePool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have thought that the caller would be responsible for returning in to the pool.
Similarly, it seems like the caller should provide out so that it can be explicitly responsible for both getting it from the pool and cleaning it up afterwards.

we can do that. I definitely think the pool interaction for in and out should be consistent (either both in caller, or both within Fix).
so this would make Fix a more pure utility function, which is good. but the downside is pulling out the neededCap := int((last-first)/interval + 1) calculation out of Fix is a bit weird.

Finally, is there a reason that we don't put the retry and allocation logic into pointSlicePool?

this makes sense. sidenote: I've been thinking of having separate size classes, that sort of stuff would all make sense if we make pointSlicePool a more full-featured "object" rather than merely a sync.Pool instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that does making moving out outside a little tough

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Jun 29, 2018

in_use space
before:

      flat  flat%   sum%        cum   cum%
   90.69MB 37.29% 37.29%    90.69MB 37.29%  github.com/grafana/metrictank/api.Fix
   29.50MB 12.13% 49.42%    29.50MB 12.13%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
   26.83MB 11.03% 60.45%    27.33MB 11.24%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).AddRange
   21.48MB  8.83% 69.28%    21.48MB  8.83%  github.com/grafana/metrictank/api/models.SeriesByTarget.ForGraphite
   14.31MB  5.88% 75.17%    26.81MB 11.02%  github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch

after:

      flat  flat%   sum%        cum   cum%
   45.73MB 20.01% 20.01%    45.73MB 20.01%  github.com/grafana/metrictank/api.(*Server).itersToPoints
   42.50MB 18.60% 38.61%    42.50MB 18.60%  github.com/grafana/metrictank/vendor/github.com/gocql/gocql.copyBytes (inline)
   31.86MB 13.94% 52.55%    31.86MB 13.94%  github.com/grafana/metrictank/api.Fix
   20.26MB  8.87% 61.42%    20.26MB  8.87%  github.com/grafana/metrictank/mdata/cache.(*CCacheMetric).AddRange
   18.50MB  8.10% 69.51%    18.50MB  8.10%  container/list.(*List).insertValue (inline)
   14.31MB  6.26% 75.77%    32.81MB 14.36%  github.com/grafana/metrictank/mdata/cache/accnt.(*LRU).touch

profiling this shows a 66% reduction in api.Fix but at the expense of itersToPoints allocating more. it's not trivial to compare since the total heap is also different, but based on the other stuff (e.g. gocql, cache AddRange) now being a larger % of mem allocation it confirms we now allocate less space.

BTW itersToPoints allocates here:

   45.73MB    45.73MB    486:				points = append(points, schema.Point{Val: val, Ts: ts})

so it's probably due to it getting a slice from the pool, but it now being smaller than before (because in other places we store smaller slices in the pool?) so this can be addressed with size classes , size hints, or something like that.

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In the future we will need to revisit the other issues referenced in this PR.

@Dieterbe Dieterbe force-pushed the point-slice-pool-fixes branch from ead0b5b to 8fde13c Compare June 10, 2019 19:21
@Dieterbe
Copy link
Contributor Author

the main dangerous thing about this PR is when we release a slice to the pool, we have to be sure there are no references to it anywhere else (e.g. in the cache).
i went through this a couple times back when i wrote this, and now again. and it still looks safe.

@Dieterbe Dieterbe merged commit b18aa05 into master Jun 10, 2019
@Dieterbe Dieterbe deleted the point-slice-pool-fixes branch June 10, 2019 19:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants