Ingester blocks retention period is ignored after upgrading to v2.6.0 #4521

kayleighmcginley · 2023-03-16T13:21:48Z

kayleighmcginley
Mar 16, 2023

Describe the bug

When upgrading Mimir to v2.6 the ingesters do not comply with the TSDB blocks retention period. I have replicated this behaviour in prod and dev environments on 4 occasions.

Using default retention period:
The default retention period was used (24h) for the initial upgrade from Mimir v2.4 to v2.6. We were alerted that the ingester PVs were filling up about 12 hours after the upgrade. On investigation, we see that the ingesters were not deleting blocks locally so blocks older than 24h were kept on disk which resulted in the disk of many ingesters filling up and the ingester pods started crashing.
Reverting to Mimir v2.4 resolved it but not automatically. The Persistent Volumes (PVs) of failing ingester pods had to be manually deleted because otherwise the new pods would have the disk full error.
Since then we have successfully upgraded to v2.5 and observed the same behaviour while trying to upgrade from v2.5 to v2.6 so this behaviour is specific to v2.6.

Using 13h retention period:
The 13h retention period was recommended by a member of the Mimir team via slack and is soon to be the new default retention period. I tested this retention period overnight in a dev environment. The number of blocks on disk increased every 2 hours for about 12 hours (from 15:00 to 03:15). Then there was a sharp drop in the number of blocks on disk which has remained steady since (over 8 hours currently).

I suspect there is a bug with this version of Mimir, perhaps the retention period of blocks is reset?

To Reproduce

Steps to reproduce the behavior:

Running Mimir v2.5.0
- -blocks-storage.tsdb.block-ranges-period set to 2h
- Default retention period (24h)
After 24+ hours of Mimir running, 12 TSDB blocks are stored on each ingester
Upgrade Mimir to v2.6.0
Run the query sum(cortex_ingester_tsdb_blocks_loaded{cluster="<cluster_name>", namespace="<ns>"}) by (pod)
- The number of blocks on ingester disk increases every 2h
- After ~12h the number of blocks drops and the retention period is followed

Expected behavior

Ingester pods adhere to the retention period after the upgrade to Mimir v2.6.

Environment

Infrastructure: AWS EKS running K8s v1.21
Deployment tool: jsonnet

Additional Context

There are no errors in the logs relating to shipping blocks.

pstibrany · 2023-03-16T13:45:08Z

pstibrany
Mar 16, 2023
Maintainer

Retention settings have changed in Mimir 2.6.0. Previously retention of the block was computed from the blocks' MaxTime. Starting in Mimir 2.6, retention is computed since block was shipped to the storage. As ingester doesn't have shipping times of blocks that were already shipped by earlier version of Mimir, it uses "current time" as shipping time, which means that those older blocks are kept in ingesters for a big longer.

Here is corresponding entry in the changelog:

* [CHANGE] Ingester: If shipping is enabled block retention will now be relative to the upload time to cloud storage. If shipping is disabled block retention will be relative to the creation time of the block instead of the mintime of the last block created. #3816

0 replies

kayleighmcginley · 2023-03-16T14:08:25Z

kayleighmcginley
Mar 16, 2023
Author

Thanks for the explanation. It was not clear from the changelog how the retention period was calculated for existing/older blocks. The "current time" being used as shipping time explains the behaviour. We will first decrease the retention period to 13h so less disk space is used and there is enough capacity on the PVs for the extra blocks to be held for 12/13h after the upgrade.
Appreciate the quick response on this, thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingester blocks retention period is ignored after upgrading to v2.6.0 #4521

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ingester blocks retention period is ignored after upgrading to v2.6.0 #4521

kayleighmcginley Mar 16, 2023

Describe the bug

To Reproduce

Expected behavior

Environment

Additional Context

Replies: 2 comments

pstibrany Mar 16, 2023 Maintainer

kayleighmcginley Mar 16, 2023 Author

kayleighmcginley
Mar 16, 2023

pstibrany
Mar 16, 2023
Maintainer

kayleighmcginley
Mar 16, 2023
Author