S3 PUT/GET request count per active time series (for cost estimation) #4411

jjti · 2023-03-07T19:18:54Z

jjti
Mar 7, 2023

Hello! I'm evaluating Mimir versus Amazon Managed Prometheus so have been doing some cost forecasting. I appreciate the insight you've provided in two places on that front:

When modeling costs, the biggest driver I can see with Mimir is not CPU/memory/disk, or storage, but the actual count of PUTs/GETs to S3 since those requests cost money:

PUT: $0.0000004
GET: $0.000005

For example, if we have 2000 active time series, a replication factor of 3, and ingestors write 5 files per hour for each time series, that's ~$100 in S3 PUTs per month:

2000 * 3 * 5 * 24 * 30 * 0.0000004 = ~$100

GETs could also get expensive, but I understand that heavy caching is used to mitigate that.

Is the logic above correct? Or can blocks consolidate multiple time series into one, minimizing the total number of S3 PUTs? Is there any more available information (eg benchmarks) on typical S3 GET/PUT counts per active time series?

Basically, is there any benchmarking data or rules of thumb for roughly what the de-amplifcation factor discussed winds up being:

Ingesters store recently received samples in-memory in order to perform write de-amplification. If the ingesters immediately write received samples to the long-term storage, the system would have difficulty scaling due to the high pressure on the long-term storage. For this reason, the ingesters batch and compress samples in-memory and periodically upload them to the long-term storage. Write de-amplification is the main source of Mimir’s low total cost of ownership (TCO).

More info, the Cortex documentation seems to suggest that multiple time series' samples are consolidated to the same block(s), which I believe may further reduce the number of GETs/PUTs. But I haven't seen what that de-amplifcation ratio winds up being.

Each block is composed by chunk files - containing the timestamp-value pairs for multiple series - and an index, which indexes metric names and labels to time series in the chunk files.

PS: will go code spelunking here in a second, but am also making this as a discussion to highlight that it may be worth addressing the above in docs

56quarters · 2023-03-07T20:18:37Z

56quarters
Mar 7, 2023
Maintainer

Before you go code spelunking, these blog posts are a good source of information on the Prometheus TSDB format which is what Mimir/Cortex/Thanos use:

4 replies

jjti Mar 7, 2023
Author

Thanks @56quarters, this blocks breakdown helps quite a bit. I saw references to a few files, this puts some numbers to it: https://ganeshvernekar.com/blog/prometheus-tsdb-persistent-block-and-its-index/#contents-of-a-block

jjti Mar 7, 2023
Author

Based on that blog, I'm estimating the PUT count at:

=replication-factor x (3+CEILING(bytes-per-month/512 MB)) x 24 x 30

for one PUT for each 512MB block every 2 hours. I realize the compactor will incur more PUTs, but I'm estimating that it's less than the ingrestors

pstibrany Mar 8, 2023
Maintainer

=replication-factor x (3+CEILING(bytes-per-month/512 MB)) x 24 x 30

This should be ... 12 x 30, right?

Replication factor doesn't play a role here, because even if sample is replicated to 3 ingesters, each ingester only uploads single block per 2h per tenant. (Replication factor will cause that multiple ingesters have same series, and you will need to run 3x more ingesters to ingest same number of series)

So number of uploaded blocks is equal to "number of tenants * number of ingesters" (assuming each tenant is on each ingester, not taking shuffle sharding into account). Each block has index, meta.json file and N segment files, each 512 MB big. Given average chunk size ~156 bytes and each chunk having 120 samples, it means that single segment file can store ~3.5M chunks which is about 420M samples.

pstibrany Mar 8, 2023
Maintainer

Note that out-of-order samples are written into separate blocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 PUT/GET request count per active time series (for cost estimation) #4411

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

S3 PUT/GET request count per active time series (for cost estimation) #4411

jjti Mar 7, 2023

Replies: 1 comment · 4 replies

56quarters Mar 7, 2023 Maintainer

jjti Mar 7, 2023 Author

jjti Mar 7, 2023 Author

pstibrany Mar 8, 2023 Maintainer

pstibrany Mar 8, 2023 Maintainer

jjti
Mar 7, 2023

Replies: 1 comment 4 replies

56quarters
Mar 7, 2023
Maintainer

jjti Mar 7, 2023
Author

jjti Mar 7, 2023
Author

pstibrany Mar 8, 2023
Maintainer

pstibrany Mar 8, 2023
Maintainer