Extend Thanos bucket rewrite to support filtered archiving of existing blocks #7402

roth-wine · 2024-05-30T09:56:54Z

Is your proposal related to a problem?

We need something to "archive" specific metrics within the Thanos/Prometheus Cosmos. Our default retention for all metrics is 90 days but for business decisions based on long-term analyses, it is important to keep some metrics much longer. Since these metrics are already collected via various endpoints and written to the S3 via thanos-sidecar, they would not have to be collected again unnecessarily but could be extracted from the existing data.

Describe the solution you'd like

We suggest to extend the thanos tool bucket rewrite with additional features and filters.
Additional features are:

Modifying the global labels in themeta.json during rewrite to add, remove or change labels.
Add the option to upload the rewritten block to a new objectstorage instead of the old one.

Additional context:
During the rewrite we want to add the global label archive=true to get a better view on it and also remove some global vars like prom_replica to take advantage of the compactor merging those blocks. (Compaction Group)
We also want the rewritten block to be handled in an separate s3 bucket to have a clean separation between "active" data and archived data.

Filters are:

Filtering blocks by compaction level
The tool should automatically determine if blocks need to be rewritten or if they are already present.

Additional context:
Currently we have defined a maximum compaction level of 3 (--debug.max-compaction-level=3). If a block with the resolution raw has reached the compaction level of 3, we would like to archive it (as stated in this issue it is currently only possible to rewrite/archive raw blocks).
As we don't want to keep track of which blocks already got rewritten the tools should fetch the meta data of all the blocks to check which blocks still needs to be rewritten.

All this effort is necessary because this should be done for several departments with several tenants without changing the productive data.

Describe alternatives you've considered

We have also considered querying the existing data again via the federation endpoint and then storing it in a dedicated object storage bucket per department/tenant via a new Prometheus/Thanos sidecar combination. This would also allow us to store only the metrics that are intended for archiving. However, the overhead due to the double storage and the additional Prometheus/Thanos sidecars might be higher than with the solution we evaluated above.

Additional context

We also read about the deletion per metric issues but it seems there is no progress so far.
#903
prometheus/prometheus#1381
The Prometheus issue is open for 8 years without a solution.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Thanos bucket rewrite to support filtered archiving of existing blocks #7402

Extend Thanos bucket rewrite to support filtered archiving of existing blocks #7402

roth-wine commented May 30, 2024

Extend Thanos bucket rewrite to support filtered archiving of existing blocks #7402

Extend Thanos bucket rewrite to support filtered archiving of existing blocks #7402

Comments

roth-wine commented May 30, 2024

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context