-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to customize alias field for opsgenie alerts #1598
Comments
This is how grouping works in the alertmanager, and what you suggest would not work as the alerts would still be in the same group and thus the same notification. group_by is the right way to handle this. |
@brian-brazil I might not have explained myself properly. my problem is for the same group_by between different Alertmanager instances running managing alerts on different kubernetes clusters and the way the opsgenie alias for each alert is calculated. Essentially i get the same Alias for 2 similar alert on 2 differenct clusters because they share the same group_by rules Example scenario
Example Rule
Example Alertmanager (in my case this is matched by the main route )
Generated alias by this and this is This means that on opsGenie those 2 alerts, which are about the overcommit of 2 different clusters so must be different alerts for sure, are threated as the same alert and deduplicated. I think that arbitrarily selecting that the alias will always be the sha256 of the GroupLabels for opsgenie is wrong and does not cover a scenario like the one i am asking about where i have multiple clusters running the exact same rules and the same routes and same grouping by in alertmanager. Adding the option to add a static seed for the opsgenie configuration so to ensure that the sha256 of the alert is different in between different clusters would fix that without workarounds. Adding a "CLUSTER=ID" label to every single group_by ( a label that is the same for every single alert alertmanager will see since is pushed as an externallabel by prometheus) is a workaround |
Using |
@simonpasquier i am using external_labels (that's exactly what hte quoted line says), from prometheus to alertmanager. How does that change the alias generated by alertmanager ? |
I just meant to say that in any case, you have to define somewhere the key that differentiate your clusters and what you describe as a workaround is the normal way to go as long as your "group by" parameter includes this external label (eg |
yeah that is what i am doing, but it seems quite a workaround ... also , and maybe this is something i am doing wrong, i tend to have quite a few sub-routes, around 30 of them, to do different grouping for different rules. in each of them i had to add the clusterid as well ... just to change the alias produced by the code for my opsgenie alert. I am just wondering if this is the correct behavior to hardcode the way the alias is generated with no other way to customize it than to mangle with all group_by in the config. |
+1. I am using OpsGenie and control over the Alias field would be appreciated. I understand that Alertmanager is supposed to deduplicate alerts for me, but I would like to see this implemented as it is a critical part of the OpsGenie API. |
+1 This is currently blocking some key functionality with OpsGenie. |
I have created this simplistic proxy script in Python3.5 to enable customization of this (on a per-rule level) as a workaround: https://github.com/pawadski/alertmanager-opsgenie-proxy This way we can, for the time being, customize alias fields (albeit manually). So that you can define "opsgenie_alias: this_is_my_alias" in the alert rules, and get that alias in the alert. While helpful, I believe it does not answer the seed question OP had, but hopefully it does help someone as much as it helped me. |
After checking this thread and a little further I managed to fix this with external labels. These helped a lot in my case:
I added this to prometheus config (I'm using custom prometheus-operator in k8s): externalLabels:
cluster: ${CLUSTER_NAME}
env: ${CLUSTER_ENV} And in Alert Manager config added this: route:
group_by:
- job
- alertname
- service
- env And now the alerts have the cluster and env in the alias, making them unique by cluster. Hope it helps. |
I did open this case and hit it again ... i believe though that the solution provided by #1598 (comment) is good enough. I propose to close this issue but maybe add the info to the documentation for the opsgenie receiver with the example from the comment above to avoid people keep hitting the same issue ? |
@primeroz closing for now then. I don't think that the reference documentation isn't the proper place for such details. |
@GMartinez-Sisti You mentioned that alerts have the cluster and env in the alias, but you only added env into the group_by statement. Did you forget to put cluster there? |
If you have multiple clusters per env you should add it to the |
@GMartinez-Sisti How did you verify the alias had in fact |
Sorry @marwaneldib, but I don't have the access to that infra anymore and can't remember the context reply to this 😅. |
OpsGenie deduplicate messages based on the Alert alias.
This alias is currently calculated by hashing the "GroupLabels" with sha256 here
In my scenario i have multiple Alertmanager sources ( for multiple kubernetes clusters ) using a single integration on the OpsGenie side with a common set of prometheus rules and alertmanager routing configuration.
Two alerts on two different clusters sharing the same set of GroupLabels produce the same ALIAS sha256 and as such they are treated as the same alert on opsgenie and deduplicated , even if they are coming from different clusters.
My current workaround is to add a static label to every group_by that include the Kubernetes ClusterId but that is not ideal since that is something i have to do for every route i ship to opsgenie
Proposal
Would it make sense to add a single configuration field to the opsgenie receiver configuration to add a static extra seed for the hashing function that generate this alias ?
This way , by leveraging templates when creating the alertmanager configuration, it would be possible to differentiate the same alert between different alertmanager sources
The text was updated successfully, but these errors were encountered: