-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expvar related resource leaks #5784
Comments
/cc @mark-rushakoff |
@nathanielc has done some research into removing expvar data and may be able to add some insight here. |
@mark-rushakoff @jonseymour I recently had the same issue with Kapacitor but much more pronounced. I solved it by creating a new expvar.Map type with a |
Thanks @nathanielc - I'll take a look. |
Is there an influxdata repo that code that, in principle, be shared across both influxdb and kapacitor, can live? (I am assuming in asking this that making influxdb dependent on code from kapacitor might be frowned upon). |
@jonseymour Go ahead an add it directly to InfluxDB. Kapacitor already pulls code from InfluxDB. Once your changes are in, Kapacitor might be able to just use it. If not then its not shared code :) Thanks for checking |
As of 2e4dedf, this is the subset of expvar methods that the influxdb tree uses these types:
and these methods:
in these files:
I might explore an alternative statistics interface that is optimized for the manner that influx uses it, currently backed by expvar &/or @nathanielc's variant of expvar but replaceable in future with another implementation should that ever be necessary. (Based on changing all imports to github.com/influxdata/influxdb/expvar/ and then creating declarations required to successfully compile the tree - see jonseymour@105dea1) |
Yes, good things can be done here. Currently the way we are updating stats via the Map.Add and Map.Set methods requires acquisition of read locks on the map. With suitable and reasonable constraints that a new type could place on how expvar is used, we can avoid these locks and only incur the cost of an atomic update. All will be revealed when I push a POC. |
I decided that the minimum required to address this issue was to introduce a new method |
…y refer to is closed As per influxdata#5784, there is a risk that a long running influxd process will accumulate a lot of statistics that describe objects that have already been closed. This is a concern primarily because of the amount of IO and CPU involved with continuing to publish the last will and testament of an object long since departed. This series address the issue by pinching some code from Kapacitor - thanks @nathanielc - and using it as a top level name space for influx statistics objects. The maps are reference counted. One reference is allocated to the described object, another to the monitor. When the described object is closed, it yells DeleteStatistics! at the influxdb package which takes note of the request, but otherwise ignores it. When the monitor is doing its thing, it quietly peeks at the statistics map's reference count and if it detects that its number is up (well, down really), it yells DeleteStatistics! at the influxdb package who takes notice of it this time. This deferral of the grim reaper is required so that the monitor has time to tidy up the paper work and confirm that the undertaker's cheque has not bounced. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Assuming this is no longer an issue with the Statistics rewrite done earlier this year. |
expvar.Map
are used internally by influx to track various kinds of statistics which are eventually logged into various measurements of the _internal database by the monitor component.expvar does not have an API that easily allows variables, once created to be removed. As a result once a statistic set has been created, the statistic set persists until the server is restarted. This is acceptable for statistic sets that are essentially singletons, but isn't ideal for statistics sets that grow in number as the size of the database grows; for example the statistic sets created by #5758 to track various cache related stats on TSM WAL segment caches.
The biggest problem is not actually memory - the size of statistics is insignificant to the time series data being processed, but rather the amount of CPU and IO that is being used to publish statistics that stop changing once a shard has been closed for writing or the database it was contained in has been deleted. In this case, the statistic for the shard continues to get published into a measurement every 10 seconds until the infuxd process is restarted even though none of the data in the statistic set ever changes again. In influxd processes that generate lots of shards and stay up for a very long period of time, this overhead may not be insignificant.
There are three problems to be solved:
One way to solve the second problem might be to periodically rewrite the expvar tree with new maps that replace maps that contain 'stale' sub maps with new 'sub maps' that don't.
Problem 3 can be solved without solving problem 2, by using a special key value to mark "quiet" maps that are to be ignored by the monitor.
A way to start making progress to an alternative solution without making large changes that touch the entire code base all at once (because of the pervasiveness of expvar references) would be to introduce into influxvar.go a new type and function:
A trivial implementation of
NewStatisticsNG
would simply be to return the result ofNewStatistics
. Any client ofNewStatisticsNG
would continue to operate unchanged except for the type they declare for their statistics map.A straightforward extension of StatisticsMap would be to add:
This would provide a way for the client of the statistics map hint to the monitor that the statistics map will no longer be updated. The hint could be implemented in the form a special key "$do-not-publish" being set to 1 in the underlying expvar map. If a Set or an Add ever occurs again, then the implementation could simply create a new expvar map and add that to the expvar tree in the same location.
If the memory consumed by stale expvar maps did ever prove to be an issue in practice, then a cleaner could be written that walks the expvar tree and rewrites the parents of maps containing "$do-not-publish" keys.
Final Design
DoStatistics
method to the monitorDeleteStatistics
Revisions
The text was updated successfully, but these errors were encountered: