-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032
Comments
Pinging @elastic/es-core-features (:Core/Features/Monitoring) |
Any update on when this might be available? Does 7.8 look realistic? |
To provide a little color to this issue, I wanted to explain the need for this for all interested parties. Elasticsearch exporters will attempt to create the six monitoring cluster alert watches every time the exporter is executed (on a master node). This code will check a few notable things before creating the watches:
The blacklisting feature is configured like:
This feature is nice, but it's configured at the exporter level (instead of at the Elasticsearch cluster level) meaning that the blacklist only applies for existing exporters. If a user were to add a new exporter in the future, this blacklist wouldn't copy over. So we aren't left with many options to disable these alerts. We can disable watcher on the cluster, but that will also disable any custom watches the user has setup. We need a way to delete any existing watcher-based cluster alerts, as well as a way to disable new ones from being created. FYI, there is some background in this thread. Please correct me if I'm mistaken on any of these points, as I'm not an expert with how Elasticsearch does this. |
Had a quick sync up to discuss this issue between @jakelandis @chrisronline @igoristic @ravikesarwani.
|
@jakelandis Any update on the progress on the API? Is this getting completed for 7.10? |
Also, is this the right issue to track or will there be another implementation issue? |
This is the right issue. @jbaiera will be leading the work. |
@jbaiera do you already have an idea of when this might land in |
@sgrodzicki I'm getting a plan of action together for how this migration is going to work in Elasticsearch. For the moment, my hopes are to get this in before 7.10 feature freeze, but I'm still getting up to speed on the inner workings of the Monitoring plugin. I'll have a more confident answer soon. |
Getting this in just before feature freeze wouldn't give us enough time to implement this in Stack Monitoring. Can we prioritize this work or document those APIs to be able to work with mocks? |
Agreed on the timing for sure. I'm starting up the implementation work for this but had a couple of questions in regards to the migration strategy: Each exporter type (local and http) will remove configured alerts if they are excluded via the blacklist. The easiest way we could approach this is to add a setting that treats all monitoring alerts as blacklisted, and let each exporter remove them on their own. The downside to this is that these will only be removed if
For us to clean up the watches, the exporter configurations need to be present (the watches may exist in a remote cluster that we can't access if we don't have its address/auth). The bigger issue comes in here with the fact that the exporters are configured to set these alerts up lazily when they receive data to send to a monitoring index, which does not get called unless the old collectors are enabled and sending data. This leads to a decision we have to make and that I'd like some feedback on: Do we expect there to be a situation where we need to remove these watches, but legacy stats collection is already disabled? If so, we'll need to move the logic that creates these resources so that they can be cleaned up outside of the regular exporting loop on top of adding the setting to prohibit their creation going forward. |
The scenario I am thinking about is:
What is the work around for the user in this case? Can these be easily deleted by the users manually? |
Looks like user cannot deactivate or delete a system watch. So feels like there may not be a simple work around. Am I missing something? |
Well, in the case that a user has removed any configurations that the monitoring plugin would use to contact a monitoring cluster there's not much we can do from the plugin to delete any watches left behind. Even if we extract the necessary removal logic from the exporters into a REST action, we're still dependent on the remote monitoring cluster settings to be present.
I don't think you can remove the watch from the Kibana UI (options are greyed out) but I was able to remove the watch by hitting the DELETE endpoint on watcher. Though, that still assumes that whatever process is deleting the watch is doing so with knowledge of the cluster address its being hosted on. Are we thinking that if monitoring is disabled/removed then the user just perform those deletes on their own? There's little risk of the watches re-appearing if their exporter configuration is removed or decommissioned already. |
@jbaiera Do you see any other possible options? If we delete in the cases we can, and also make Kibana UI looses up so that users can delete "system watch" easily (assuming that these 6 monitoring watches are the only system watch we have) then I think we have workable solution. @chrisronline Is is easy to relax kibana UI to allow deletion of the "system watch" when automated way is not possible? Is it an acceptable solution or do you see any side effects? |
This is fine to do.
Yes, this will be the case and it is also possible for watchers to exist on some clusters where there isn't a dedicated Kibana so they will not have the luxury of using a UI. I don't think we can solve all users with this, but we can certainly help the majority of users by cleaning up these unnecessary watches. |
I see this migration going one of a few ways:
Does this seem like a reasonable way to handle the migration of cluster alerts? |
@jbaiera Yes, that sounds perfect. The API call is exactly what we're looking for, and this API call ensures that all existing (at least existing and known) watches are removed and guarantees no new ones are created. |
Yes, this is great and covers most cases automatically. |
First part of this work is up as a PR (#62668). I'll begin working on the migration API next. I have some questions regarding some finer points about the migration API's implementation. One concern I'm having after finishing the setting PR up is that watch removal can fail for any number of reasons. A remote monitoring cluster can experience network failures or no longer be available. Local seems least likely to have failures when deleting these alerts, but the watcher index could be unavailable. These failures are usually transient to the plugin because in the case of a setup failure, it just blocks until the next collection cycle. API's are a one shot attempt. We discussed that the API would make its best attempt at cleaning out the watches on any configured exporters. How far should that best attempt go? Should we block and retry until we're reasonably confident we can or can't delete all the watches or just make one attempt and shrug if some attempts fail? Additionally, what kind of information should we return? I was thinking of a list of all configured exporters and whether they were successfully cleaned, but I'm not sure if that's even helpful for the stack monitoring solution. The less we return the faster this work can get done. |
@ravikesarwani I'm wrapping up the work required for accepting the rest request. I'll check back here once the PR is up and again once it's merged. |
PR for the migrate action is up now! #64373 |
This work is completed now with #64373. |
Thanks for the work here folks! In terms of #50032 (comment), is there plans to add this in 7.12? |
@jbaiera @jakelandis Ping on #50032 (comment) |
@chrisronline This should be available starting in 7.11 |
@jbaiera Are you referring to #64373? I'm not seeing the functionality we discussed in #50032 (comment) but maybe I'm missing it? |
Ah, sorry @chrisronline, I misunderstood. I'll reopen this until after that is complete. I'm not sure if it's slotted in for 2.12 at the moment. |
Circling back here to close this out - We won't be updating the migrate API to hand off emails. We consider those credentials and we're not keen on exposing those through an API call for the time being. |
A new alerting system in Kibana is currently under development : https://www.elastic.co/blog/alerting-in-the-elastic-stack
As part of the this effort we want to enable users a migration path for the default Watches that ship with Monitoring to this new system. This requires some way for Kibana to tell Elasticsearch it is now handling the default alerts. Once Kibana is handling the default alerts, Elasticsearch should a) remove existing default monitoring Watches b) prevent new default monitoring Watches from being used.
Note - this only includes the 6 default Watches that ship with monitoring and does not include custom watches.
The text was updated successfully, but these errors were encountered: