-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.12] Retention policy cleanup does not remove series #6457
Comments
I +1 this. I have
entries in the logs, but the number of series in the databse continues to mount(it was ~6million when I added the retention policy (last tuesday), now it's over 8 million). I haven't tried restarting, because i am scared it will never start up properly. |
You are having a different issue @sofixa. Creating a new retention policy does not apply retroactively to existing data, it will only apply to new data that is expressly written to it, or you have to set the new retention policy as default and then migrate existing data to the new retention policy. The structural order of InfluxDB is something like:
Measurements are contained within a retention policy, which is also obvious when you look at how queries are actually performed. The full syntax of a select query is: SELECT * FROM "my_db"."my_retention_policy"."my_measurement_name" WHERE ... |
When a shard is closed and removed due to retention policy enforcement, the series contained in the shard would still exists in the index causing a memory leak. Restarting the server would cause them not to be loaded. Fixes #6457
When a shard is closed and removed due to retention policy enforcement, the series contained in the shard would still exists in the index causing a memory leak. Restarting the server would cause them not to be loaded. Fixes #6457
I have a dataset that contains an unbounded number of endpoints that need to be queried, namely guest devices on a transient network (think hotel lobby or coffee shop). I am trying to store a small number of metrics over a very large and constantly changing number of devices. At any given time I have approximately 50,000 unique devices connected to the system, and add about 500~1000 unique devices per minute. I need to be able to display metrics on a particular device, so each unique device becomes a unique series, i.e:
I have created a 1hr retention policy to store the per-device data so the number of series does not become unwieldy:
I have setup the retention check-interval to run every 5 minutes so I should be evicting old data fairly regularly and thereby keeping the number of series to a minimum. (50,000 initial + (1000 per/min * 60 min) = ~110,000 series total for retention policy)
The issue is that the number of series increases linearly until the system runs out of memory and eventually crashes. The retention policy runs as expected but the number of series that the system reports does not decrease along with the removal of the shards.
Number of series
If I restart the system it will immediately drop off 10's of thousands of series even though the retention policy check just completed.
After restart:
Expected behavior:
The retention policy should drop shards older than 1 hour every 5 minutes which should reduce the number of series accordingly leaving the number of series relatively stable over a long duration.
Actual behavior:
The retention policy runs but the number of series remains unchanged, eventually overloading the system. Restarting the system reduces the number of series to the expected level.
Here is a chronograf chart showing the unexepected rise in the number of series:
(Retention policy check runs every 5 minutes)
This was generated using the following query:
The text was updated successfully, but these errors were encountered: