Ingester blocks retention period is ignored after upgrading to v2.6.0 #4521
Replies: 2 comments
-
Retention settings have changed in Mimir 2.6.0. Previously retention of the block was computed from the blocks' MaxTime. Starting in Mimir 2.6, retention is computed since block was shipped to the storage. As ingester doesn't have shipping times of blocks that were already shipped by earlier version of Mimir, it uses "current time" as shipping time, which means that those older blocks are kept in ingesters for a big longer. Here is corresponding entry in the changelog:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation. It was not clear from the changelog how the retention period was calculated for existing/older blocks. The "current time" being used as shipping time explains the behaviour. We will first decrease the retention period to 13h so less disk space is used and there is enough capacity on the PVs for the extra blocks to be held for 12/13h after the upgrade. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
When upgrading Mimir to v2.6 the ingesters do not comply with the TSDB blocks retention period. I have replicated this behaviour in prod and dev environments on 4 occasions.
Using default retention period:
The default retention period was used (24h) for the initial upgrade from Mimir v2.4 to v2.6. We were alerted that the ingester PVs were filling up about 12 hours after the upgrade. On investigation, we see that the ingesters were not deleting blocks locally so blocks older than 24h were kept on disk which resulted in the disk of many ingesters filling up and the ingester pods started crashing.
Reverting to Mimir v2.4 resolved it but not automatically. The Persistent Volumes (PVs) of failing ingester pods had to be manually deleted because otherwise the new pods would have the disk full error.
Since then we have successfully upgraded to v2.5 and observed the same behaviour while trying to upgrade from v2.5 to v2.6 so this behaviour is specific to v2.6.
Using 13h retention period:
The 13h retention period was recommended by a member of the Mimir team via slack and is soon to be the new default retention period. I tested this retention period overnight in a dev environment. The number of blocks on disk increased every 2 hours for about 12 hours (from 15:00 to 03:15). Then there was a sharp drop in the number of blocks on disk which has remained steady since (over 8 hours currently).
I suspect there is a bug with this version of Mimir, perhaps the retention period of blocks is reset?
To Reproduce
Steps to reproduce the behavior:
-blocks-storage.tsdb.block-ranges-period
set to2h
sum(cortex_ingester_tsdb_blocks_loaded{cluster="<cluster_name>", namespace="<ns>"}) by (pod)
Expected behavior
Ingester pods adhere to the retention period after the upgrade to Mimir v2.6.
Environment
Additional Context
There are no errors in the logs relating to shipping blocks.
Beta Was this translation helpful? Give feedback.
All reactions