Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New blog about Thanos adoption at Aiven #6461

Merged
merged 29 commits into from
Jun 27, 2023
Merged

Conversation

jkowall
Copy link
Contributor

@jkowall jkowall commented Jun 20, 2023

This is a new blog I have written with two other co-authors here at Aiven on our adoption of Thanos at Aiven. I think it's a great contribution, and we hope to keep doing so upstream and pushing Thanos forward in the community. I would appreciate any reviews on the blog.

Thanks!

jkowall added 23 commits June 8, 2023 17:34
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
bwplotka
bwplotka previously approved these changes Jun 21, 2023
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic! Thanks for this. Some high level comments, but overall looks amazing 💪🏽

* Thanos with six months retention: $19,703
* Thanos with a years retention: $22,447
* Thanos with 2 years retention: $27,955
* Thanos with 3 years retention: $33,423
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, Is your obj storage cost alone ~18k USD for 3y data? Sounds bit off and expensive 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of metrics and manage a large fleet. We also use Thanos (m3) for our billing data which is quite detailed at the service level. This is really the costs, but look how it compares to the cost of M3.


We are also paying roughly 25% for the storage costs. M3DB has a total of 54TB of storage provisioned today, at a cost of $4320 per month. We could house 216TB of storage for the same cost with Thanos. We are currently generating about 750GB per day, which means we can keep almost a year of metrics for the same cost as M3DB. Additionally, we are backing up M3DB which is using 33TB of object storage at a cost of $1,320 per month. With object storage, we have the added cost for the networking, this is around $1800 per month in additional costs. Here are our estimated costs:

* M3: $38480
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what retention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add details on that. Thanks.

docs/blog/2023-06-08-thanos-at-aiven.md Outdated Show resolved Hide resolved

## Performance Gains

The performance normally was much better in our ongoing reporting and alerting needs as well. Today we are using vmalert to drive our alerting pipeline, since M3DB is limited, there is no such thing as an alertmanager integration. This brings me to another issue we found with vmalert it would sometimes execute rules twice within the same group evaluation period. This, by default, would realign the result timestamp with the group evaluation start time, which would lead to failed and rejected writes. The timestamp issue was caused by samples with same timestamp but different value ), this was fixed by disabling this query time alignment ( datasource.queryTimeAlignment ).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why vmalert? Any specific reason why not Thanos Ruler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M3DB doesn't have a similar capability, so we are using vmalert. It was easier than switching everything over. We are happy with vmalert right now even though it's pull vs push.

saswatamcode
saswatamcode previously approved these changes Jun 22, 2023
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀
LGTM mod small nit about the diagram and @bwplotka's comments.

docs/blog/2023-06-08-thanos-at-aiven.md Outdated Show resolved Hide resolved
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
@jkowall jkowall dismissed stale reviews from saswatamcode and bwplotka via c1311c2 June 23, 2023 12:42
jkowall added 2 commits June 23, 2023 08:51
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
@pull-request-size pull-request-size bot added size/M and removed size/L labels Jun 23, 2023
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
@jkowall
Copy link
Contributor Author

jkowall commented Jun 23, 2023

Not sure why the docs is giving an error on outstanding commits, can someone assist on it? (@saswatamcode)

@saswatamcode
Copy link
Member

Yes, @jkowall! Could you run make docs and commit? This formats the markdown file (tables/spacing/whitespaces etc). 🙂

Signed-off-by: Jonah Kowall <jkowall@kowall.net>
@jkowall
Copy link
Contributor Author

jkowall commented Jun 25, 2023

Looks good now, thanks @saswatamcode !

@jkowall
Copy link
Contributor Author

jkowall commented Jun 26, 2023

Still can't merge it due to a CI failure, any advice would be helpful.

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write up 👍 I kicked the CI a couple of times since failures were flakes in e2e tests.

Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the epic blog!

@fpetkovski fpetkovski merged commit 6b7354c into thanos-io:main Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants