How large can block_retention reasonably be? #3728

maherkader-sai · 2024-05-29T08:04:36Z

maherkader-sai
May 29, 2024

We are deploying Tempo via the tempo-distributed Helm Chart.

The default helm value for compaction.block_retention is 48h, the default config value is 14d. IMO these feel like quite short timeframes; what's the reasoning for these subjectively low values?

We are thinking for our use-case, we'd want to retain traces for much longer.... like 2 years! Let's say we create 1000 traces each day, each with 10000x1KB spans. Assuming storage capacity is not a concern, is this amount of traffic sustainable with a 2 year block_retention? Do we need to tune any other configurations in tandem (eg compaction_window)? Are there any query patterns we'd have to avoid? What would be the most stressful operations against our storage layer (self-hosted MinIO)?

Appreciate any tips, thanks!

Answered by mdisibio

May 30, 2024

Hi, this is a good question. We have a worked with a few places using longer-term storage to meet auditing requirements, and I would be interested to hear more about your use case.

what's the reasoning for these subjectively low values?

Agree, I think the main reason for these low defaults is to prevent other worse issues that are likely on a new install: filling disks, unexpected object storage costs, high latency from lack of appropriate scaling. As the operator becomes more experienced and tunes the cluster, then storage can be bumped as well. 30d is more of what I consider the sweet spot. Traces are frequently used for troubleshooting live systems, and the value of a trace diminishe…

View full answer

mdisibio · 2024-05-30T12:56:01Z

mdisibio
May 30, 2024
Maintainer

Hi, this is a good question. We have a worked with a few places using longer-term storage to meet auditing requirements, and I would be interested to hear more about your use case.

what's the reasoning for these subjectively low values?

Agree, I think the main reason for these low defaults is to prevent other worse issues that are likely on a new install: filling disks, unexpected object storage costs, high latency from lack of appropriate scaling. As the operator becomes more experienced and tunes the cluster, then storage can be bumped as well. 30d is more of what I consider the sweet spot. Traces are frequently used for troubleshooting live systems, and the value of a trace diminishes as it ages and departs from the current state of the system.

2 years

🚀 😄

1000 traces each day, each with 10000x1KB

I provide detail about scaling and tuning below, but then it occurred to me that the biggest risk is the block format. 2y retention means the block format must be stable in Tempo for 2 years. I'm not sure we can provide that guarantee at this stage, as we are making rapid changes in this area for new features. We just deleted the vParquet1 format which was added June 2022 This is just short of 2y :) Note: Tempo doesn't upgrade existing blocks to new formats because of the large overhead. We just let ingesters start writing the new format, and old formats naturally cycle out (blocks deleted).

But assuming the format is stable for 2y then:

I think the main thing to consider is the total amount of data. A cluster will require similar scale and resources at X TB of data, regardless if it is spread out over 2 years, or 14d. If configured ideally, there could be the same number and size of blocks in both scenarios, and Tempo will be fine. Because a 2 year retention will have 50X the data as 14d install, in my mind is the question is: can we 50X the number of blocks on this cluster.

The main pressure will be on the read path, as trace lookup must inspect all blocks. Higher number of queriers, and tuning around the frontend->querier path will be needed (job parallelism, etc). Caching will be required for bloom filters and page i/o. The scaling here is based on the number and size of blocks. The same amount of work is required to scan 100K blocks whether they are spread across 14d or 2y.

The next pressure will be on compactors. With short retention like 14d, under-scaled compactors can go unnoticed because blocks that weren't able to be well-compacted get deleted anyway. But at 2 years, it will be important that compactors are able to keep up. Higher number of compactors will be needed for sure.

I would also increase compaction_window. The default is 1h, but this starts with a high floor on the minimum number of blocks. Would try 4-24h. Adjust as needed to keep the block list around 100K, which is on the upper end of comfort in my experience. A larger compaction_window only works for low-volume, long-retention. (High-volume clusters typically are 5m!)

The highest pressure on object storage is likely to be polling/listing blocks due to a high block list.

Ingesters probably don't need much work. They must scale for the write volume, which is not a problem for low-volume/high-retention.

One more thing to consider is the Grafana UI can be configured to provide a time range when looking up a trace. It means that when you look up a trace by ID, instead of scanning the whole 2y, Tempo can only look in a subset like last 30 days. This helps performance, but requires you to know roughly the timeframe for the trace. Therefore it is not commonly used, but might be helpful in this scenario.

1 reply

maherkader-sai May 31, 2024
Author

Thanks for the answer! Our use-case is not the standard http-tracing case; we would like to use Grafana and Tempo to visualize/trace the various processes in our business that ultimately produce ML training datasets, so we're interested in possibly querying data from months ago.

We are considering rolling our own SQL DB for this, but decided it was worth trying Tempo first, given that we've already integrated with Loki/Mimir for our application telemetry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How large can block_retention reasonably be? #3728

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How large can block_retention reasonably be? #3728

maherkader-sai May 29, 2024

Replies: 1 comment · 1 reply

mdisibio May 30, 2024 Maintainer

maherkader-sai May 31, 2024 Author

maherkader-sai
May 29, 2024

Replies: 1 comment 1 reply

mdisibio
May 30, 2024
Maintainer

maherkader-sai May 31, 2024
Author