Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032

jakelandis · 2019-12-10T14:38:37Z

A new alerting system in Kibana is currently under development : https://www.elastic.co/blog/alerting-in-the-elastic-stack

As part of the this effort we want to enable users a migration path for the default Watches that ship with Monitoring to this new system. This requires some way for Kibana to tell Elasticsearch it is now handling the default alerts. Once Kibana is handling the default alerts, Elasticsearch should a) remove existing default monitoring Watches b) prevent new default monitoring Watches from being used.

Note - this only includes the 6 default Watches that ship with monitoring and does not include custom watches.

elasticmachine · 2019-12-10T14:38:39Z

Pinging @elastic/es-core-features (:Core/Features/Monitoring)

chrisronline · 2020-03-31T19:48:14Z

Any update on when this might be available? Does 7.8 look realistic?

chrisronline · 2020-06-02T14:32:16Z

To provide a little color to this issue, I wanted to explain the need for this for all interested parties.

Elasticsearch exporters will attempt to create the six monitoring cluster alert watches every time the exporter is executed (on a master node). This code will check a few notable things before creating the watches:

Does the user have the right license for watcher? Source
Is the watch in the blacklist? If it is, skip adding it. Source

The blacklisting feature is configured like:

cluster_alerts.management.blacklist
Prevents the creation of specific cluster alerts. It also removes any applicable watches that already exist in the current cluster.
You can add any of the following watch identifiers to the blacklist:

elasticsearch_cluster_status
elasticsearch_version_mismatch
elasticsearch_nodes
kibana_version_mismatch
logstash_version_mismatch
xpack_license_expiration
For example: ["elasticsearch_version_mismatch","xpack_license_expiration"].

This feature is nice, but it's configured at the exporter level (instead of at the Elasticsearch cluster level) meaning that the blacklist only applies for existing exporters. If a user were to add a new exporter in the future, this blacklist wouldn't copy over.

So we aren't left with many options to disable these alerts. We can disable watcher on the cluster, but that will also disable any custom watches the user has setup.

We need a way to delete any existing watcher-based cluster alerts, as well as a way to disable new ones from being created.

FYI, there is some background in this thread.

Please correct me if I'm mistaken on any of these points, as I'm not an expert with how Elasticsearch does this.

ravikesarwani · 2020-07-21T19:06:47Z

Had a quick sync up to discuss this issue between @jakelandis @chrisronline @igoristic @ravikesarwani.
We will try to get this code delivered for 7.10 with both changes on ES and SM.

ES team will provide an API/cluster setting that when set will remove the 6 watchers and also modify the code so that the 6 watcher's aren't created again.
Stack monitoring team will create the 6 replacement Kibana alerts when user visits the stack monitoring UI and will call the API/cluster setting that ES team provides.

ravikesarwani · 2020-08-31T19:47:46Z

@jakelandis Any update on the progress on the API? Is this getting completed for 7.10?

sgrodzicki · 2020-09-03T07:17:29Z

@jakelandis Any update on the progress on the API? Is this getting completed for 7.10?

Also, is this the right issue to track or will there be another implementation issue?

jakelandis · 2020-09-03T13:30:49Z

Also, is this the right issue to track or will there be another implementation issue?

This is the right issue. @jbaiera will be leading the work.

sgrodzicki · 2020-09-03T14:05:13Z

@jbaiera do you already have an idea of when this might land in master?

jbaiera · 2020-09-04T19:33:19Z

@sgrodzicki I'm getting a plan of action together for how this migration is going to work in Elasticsearch. For the moment, my hopes are to get this in before 7.10 feature freeze, but I'm still getting up to speed on the inner workings of the Monitoring plugin. I'll have a more confident answer soon.

sgrodzicki · 2020-09-05T07:24:17Z

Getting this in just before feature freeze wouldn't give us enough time to implement this in Stack Monitoring. Can we prioritize this work or document those APIs to be able to work with mocks?

jbaiera · 2020-09-09T19:49:06Z

Agreed on the timing for sure. I'm starting up the implementation work for this but had a couple of questions in regards to the migration strategy:

Each exporter type (local and http) will remove configured alerts if they are excluded via the blacklist. The easiest way we could approach this is to add a setting that treats all monitoring alerts as blacklisted, and let each exporter remove them on their own. The downside to this is that these will only be removed if

The legacy monitoring exporters are still enabled
Either the legacy Elasticsearch collectors are running or monitoring data arrives at the monitoring bulk action

For us to clean up the watches, the exporter configurations need to be present (the watches may exist in a remote cluster that we can't access if we don't have its address/auth). The bigger issue comes in here with the fact that the exporters are configured to set these alerts up lazily when they receive data to send to a monitoring index, which does not get called unless the old collectors are enabled and sending data. This leads to a decision we have to make and that I'd like some feedback on:

Do we expect there to be a situation where we need to remove these watches, but legacy stats collection is already disabled? If so, we'll need to move the logic that creates these resources so that they can be cleaned up outside of the regular exporting loop on top of adding the setting to prohibit their creation going forward.

ravikesarwani · 2020-09-09T20:54:41Z

The scenario I am thinking about is:

User was using internal collection and hence watcher's were created
User has now disabled internal collection and migrated to Metricbeat (this is what we are promoting)
In new version (7.10+) we will create replacement Kibana alerts but ES cannot delete the watches since http exporter is disabled.

What is the work around for the user in this case? Can these be easily deleted by the users manually?

ravikesarwani · 2020-09-09T21:03:18Z

Looks like user cannot deactivate or delete a system watch. So feels like there may not be a simple work around. Am I missing something?

jbaiera · 2020-09-09T21:28:59Z

Well, in the case that a user has removed any configurations that the monitoring plugin would use to contact a monitoring cluster there's not much we can do from the plugin to delete any watches left behind. Even if we extract the necessary removal logic from the exporters into a REST action, we're still dependent on the remote monitoring cluster settings to be present.

Looks like user cannot deactivate or delete a system watch. So feels like there may not be a simple work around. Am I missing something?

I don't think you can remove the watch from the Kibana UI (options are greyed out) but I was able to remove the watch by hitting the DELETE endpoint on watcher.

Though, that still assumes that whatever process is deleting the watch is doing so with knowledge of the cluster address its being hosted on. Are we thinking that if monitoring is disabled/removed then the user just perform those deletes on their own? There's little risk of the watches re-appearing if their exporter configuration is removed or decommissioned already.

ravikesarwani · 2020-09-10T01:12:35Z

Are we thinking that if monitoring is disabled/removed then the user just perform those deletes on their own? There's little risk of the watches re-appearing if their exporter configuration is removed or decommissioned already.

@jbaiera Do you see any other possible options?

If we delete in the cases we can, and also make Kibana UI looses up so that users can delete "system watch" easily (assuming that these 6 monitoring watches are the only system watch we have) then I think we have workable solution.

@chrisronline Is is easy to relax kibana UI to allow deletion of the "system watch" when automated way is not possible? Is it an acceptable solution or do you see any side effects?

chrisronline · 2020-09-10T17:49:41Z

Is is easy to relax kibana UI to allow deletion of the "system watch" when automated way is not possible? Is it an acceptable solution or do you see any side effects?

This is fine to do.

Do we expect there to be a situation where we need to remove these watches, but legacy stats collection is already disabled?

Yes, this will be the case and it is also possible for watchers to exist on some clusters where there isn't a dedicated Kibana so they will not have the luxury of using a UI.

I don't think we can solve all users with this, but we can certainly help the majority of users by cleaning up these unnecessary watches.

jbaiera · 2020-09-10T18:58:08Z

I see this migration going one of a few ways:

The monitoring plugin is enabled on a cluster and is actively sending data to a monitoring index. This configuration will continuously make sure that cluster alerts are created if able to. We can add a setting here that will tell the plugin to remove any existing cluster alerts and stop creating them. We also add a migrate API that toggles this setting on. There are some benefits to this being an endpoint as detailed in the next scenario.
The exporting of stats has been disabled for any reason. While disabled, the plugin will not attempt to recreate any cluster alerts. In this case, the plugin still has all of the information about its exporters. The migrate API would be called which would toggle the above mentioned setting to keep it from creating alerts if it is ever re-enabled. The API would then iterate over each of the currently configured exporters, removing their alerts directly.
Settings that define exporters in monitoring have been removed, and thus the plugin can no longer reach one or more remote clusters. The alerts on those remote clusters would need to be removed manually by an administrator. In the migrate API we attempt to clean up all that we can (which may include a local exporter) but any remote clusters not in the configuration cannot be cleaned up automatically.

Does this seem like a reasonable way to handle the migration of cluster alerts?

chrisronline · 2020-09-10T19:08:39Z

@jbaiera Yes, that sounds perfect. The API call is exactly what we're looking for, and this API call ensures that all existing (at least existing and known) watches are removed and guarantees no new ones are created.

ravikesarwani · 2020-09-10T23:38:52Z

Yes, this is great and covers most cases automatically.

jbaiera · 2020-09-18T21:21:45Z

First part of this work is up as a PR (#62668). I'll begin working on the migration API next.

I have some questions regarding some finer points about the migration API's implementation. One concern I'm having after finishing the setting PR up is that watch removal can fail for any number of reasons. A remote monitoring cluster can experience network failures or no longer be available. Local seems least likely to have failures when deleting these alerts, but the watcher index could be unavailable. These failures are usually transient to the plugin because in the case of a setup failure, it just blocks until the next collection cycle. API's are a one shot attempt.

We discussed that the API would make its best attempt at cleaning out the watches on any configured exporters. How far should that best attempt go? Should we block and retry until we're reasonably confident we can or can't delete all the watches or just make one attempt and shrug if some attempts fail? Additionally, what kind of information should we return? I was thinking of a list of all configured exporters and whether they were successfully cleaned, but I'm not sure if that's even helpful for the stack monitoring solution. The less we return the faster this work can get done.

jbaiera · 2020-10-28T19:46:00Z

@ravikesarwani I'm wrapping up the work required for accepting the rest request. I'll check back here once the PR is up and again once it's merged.

jbaiera · 2020-10-29T18:23:56Z

PR for the migrate action is up now! #64373

sgrodzicki · 2020-11-19T13:15:48Z

PR for the migrate action is up now! #64373

@jbaiera what would be the ETA for this? We want to assess whether we will be able to make changes in Kibana for 7.11.

jbaiera · 2020-12-17T18:27:36Z

This work is completed now with #64373.

chrisronline · 2020-12-21T19:08:36Z

Thanks for the work here folks!

In terms of #50032 (comment), is there plans to add this in 7.12?

chrisronline · 2021-01-11T17:34:24Z

@jbaiera @jakelandis Ping on #50032 (comment)

jbaiera · 2021-01-11T17:54:46Z

@chrisronline This should be available starting in 7.11

chrisronline · 2021-01-11T17:55:52Z

@jbaiera Are you referring to #64373? I'm not seeing the functionality we discussed in #50032 (comment) but maybe I'm missing it?

jbaiera · 2021-01-13T20:17:02Z

Ah, sorry @chrisronline, I misunderstood. I'll reopen this until after that is complete. I'm not sure if it's slotted in for 2.12 at the moment.

jbaiera · 2021-07-29T17:35:22Z

Circling back here to close this out - We won't be updating the migrate API to hand off emails. We consider those credentials and we're not keen on exposing those through an API call for the time being.

jakelandis added :Data Management/Monitoring :Data Management/Watcher labels Dec 10, 2019

chrisronline mentioned this issue Dec 11, 2019

[Monitoring] Kibana Alerting elastic/kibana#49219

Closed

5 tasks

jakelandis mentioned this issue Dec 12, 2019

[Monitoring] Blacklist cluster alerts to migrate to Kibana alerting #48301

Closed

danhermann mentioned this issue Jan 8, 2020

Stack monitoring changes for Elasticsearch #50770

Closed

14 tasks

cachedout mentioned this issue Apr 16, 2020

[Stack Monitoring] Alerting Phase -1 elastic/kibana#42960

Open

10 tasks

rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020

chrisronline mentioned this issue Jun 10, 2020

[Monitoring] Out of the box alerting elastic/kibana#68805

Merged

10 tasks

chrisronline mentioned this issue Jul 8, 2020

[Monitoring] Cluster Status alert triggers on transient yellow status #34814

Closed

jakelandis assigned jbaiera Sep 3, 2020

joegallo added >feature >enhancement and removed >feature labels Nov 4, 2020

jbaiera closed this as completed Dec 17, 2020

jbaiera added v7.11.0 v8.0.0 labels Dec 17, 2020

chrisronline mentioned this issue Dec 22, 2020

[Monitoring] Remove deprecated watcher-based cluster alerts elastic/kibana#85047

Merged

jbaiera reopened this Jan 13, 2021

pugnascotia added v7.12.0 and removed v7.11.0 labels Feb 8, 2021

williamrandolph added v7.13.0 and removed v7.12.0 labels Feb 18, 2021

pugnascotia added v7.14.0 and removed v7.13.0 labels Apr 21, 2021

Leaf-Lin mentioned this issue May 26, 2021

Rename Elasticsearch "migrate" tool #73415

Closed

mark-vieira added v7.15.0 and removed v7.14.0 labels Jun 30, 2021

jbaiera closed this as completed Jul 29, 2021

mark-vieira added v8.0.0-alpha1 and removed v8.0.0 labels Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032

Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032

jakelandis commented Dec 10, 2019 •

edited

Loading

elasticmachine commented Dec 10, 2019

chrisronline commented Mar 31, 2020

chrisronline commented Jun 2, 2020

ravikesarwani commented Jul 21, 2020

ravikesarwani commented Aug 31, 2020 •

edited

Loading

sgrodzicki commented Sep 3, 2020

jakelandis commented Sep 3, 2020

sgrodzicki commented Sep 3, 2020

jbaiera commented Sep 4, 2020

sgrodzicki commented Sep 5, 2020

jbaiera commented Sep 9, 2020

ravikesarwani commented Sep 9, 2020 •

edited

Loading

ravikesarwani commented Sep 9, 2020

jbaiera commented Sep 9, 2020

ravikesarwani commented Sep 10, 2020

chrisronline commented Sep 10, 2020

jbaiera commented Sep 10, 2020

chrisronline commented Sep 10, 2020

ravikesarwani commented Sep 10, 2020

jbaiera commented Sep 18, 2020

jbaiera commented Oct 28, 2020

jbaiera commented Oct 29, 2020

sgrodzicki commented Nov 19, 2020 •

edited

Loading

jbaiera commented Dec 17, 2020

chrisronline commented Dec 21, 2020

chrisronline commented Jan 11, 2021

jbaiera commented Jan 11, 2021

chrisronline commented Jan 11, 2021

jbaiera commented Jan 13, 2021

jbaiera commented Jul 29, 2021

Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032

Provide migration path for monitoring alerting from Watcher to Kibana alerting #50032

Comments

jakelandis commented Dec 10, 2019 • edited Loading

elasticmachine commented Dec 10, 2019

chrisronline commented Mar 31, 2020

chrisronline commented Jun 2, 2020

ravikesarwani commented Jul 21, 2020

ravikesarwani commented Aug 31, 2020 • edited Loading

sgrodzicki commented Sep 3, 2020

jakelandis commented Sep 3, 2020

sgrodzicki commented Sep 3, 2020

jbaiera commented Sep 4, 2020

sgrodzicki commented Sep 5, 2020

jbaiera commented Sep 9, 2020

ravikesarwani commented Sep 9, 2020 • edited Loading

ravikesarwani commented Sep 9, 2020

jbaiera commented Sep 9, 2020

ravikesarwani commented Sep 10, 2020

chrisronline commented Sep 10, 2020

jbaiera commented Sep 10, 2020

chrisronline commented Sep 10, 2020

ravikesarwani commented Sep 10, 2020

jbaiera commented Sep 18, 2020

jbaiera commented Oct 28, 2020

jbaiera commented Oct 29, 2020

sgrodzicki commented Nov 19, 2020 • edited Loading

jbaiera commented Dec 17, 2020

chrisronline commented Dec 21, 2020

chrisronline commented Jan 11, 2021

jbaiera commented Jan 11, 2021

chrisronline commented Jan 11, 2021

jbaiera commented Jan 13, 2021

jbaiera commented Jul 29, 2021

jakelandis commented Dec 10, 2019 •

edited

Loading

ravikesarwani commented Aug 31, 2020 •

edited

Loading

ravikesarwani commented Sep 9, 2020 •

edited

Loading

sgrodzicki commented Nov 19, 2020 •

edited

Loading