Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting duplicate alerts if we remove the Thanos Ruler #6256

Open
SanthoshiMekala opened this issue Apr 3, 2023 · 1 comment
Open

Getting duplicate alerts if we remove the Thanos Ruler #6256

SanthoshiMekala opened this issue Apr 3, 2023 · 1 comment

Comments

@SanthoshiMekala
Copy link

Hi All,

Recently we have upgraded our Thanos and Prometheus versions to the below ones.
Thanos: from 0.23.0 to 0.29.0
Prometheus: from 2.34.0 to 2.40.0.

In production we have around 25000 targets and because of huge load we opted for Prometheus shards instead of single Prometheus server and implemented Thanos components for the aggregation. Our thanos architecture comprises three components,
Prometheus shards - For scraping and storing the metrics in local storage.
Thanos query - For viewing the TSDB data globally from all the prometheus shards.
Thanos ruler - For storing the output of processed rules and Alerting system will get data from this and alerts based on the condition.

As per the Thanos documentation (https://thanos.io/tip/components/rule.md/#rule-aka-ruler), it is not recommended to use Thanos Ruler except in some specific cases. So, we tried migrating all the recording and alerting rules to Prometheus shards and removed Thanos ruler. We have 7 replicas for Prometheus shards, due to the above change, we are getting duplicate alerts (7 alerts) for each rule. Multiple shards will scrape the targets of single namespace and the rules execution is being done locally in Prometheus shards and as a result, we are getting duplicate results.

Is it the expected behaviour? Is it the recommended approach when we have multiple replicas for Prometheus shards, we need to use Thanos Ruler? Or Thanos ruler is not needed for our scenario?

Thank you!

@douglascamata
Copy link
Contributor

@SanthoshiMekala this is more a Prometheus and Alertmanager question. Luckily PromLabs has great docs on this: https://training.promlabs.com/training/relabeling/writing-relabeling-rules/keeping-and-dropping-labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants