Add ability to override cluster_uuid to be used in monitoring data #13182

ycombinator · 2019-08-06T19:13:03Z

Background and Problem

Starting 7.2.0, the xpack.monitoring.* settings in Beats are deprecated in favor of monitoring.* settings. The former were used to define the production cluster to which the Beat should send its monitoring data. The latter are used to define the monitoring cluster to which the Beat should directly send its monitoring data.

When the monitoring.* settings are used with an output other than elasticsearch, there's no way (currently) to know if the Beat's "regular" (i.e. non-monitoring) data is going to end up in an Elasticsearch cluster. As such, we cannot associate the Beat with an Elasticsearch cluster in the Stack Monitoring UI. Instead we show it under a "Standalone Cluster".

Prior to 7.2.0, if users were using an output other than elasticsearch, they would still send monitoring data via the production cluster (by using the xpack.monitoring.* settings). As such, the production cluster could enrich the monitoring data such that the Stack Monitoring UI would show the Beat associated with that production Elasticsearch cluster.

Starting 7.2.0 instead, the same Beat has now "moved" to a Standalone Cluster in the UI and users are not happy about it, specifically users who know that their Beats are eventually sending their regular data into Elasticsearch. Such users would like to see the Beat associated with their production Elasticsearch cluster in the UI.

Solution

This PR aims to solve the above problem by providing said class of users with a new setting in their Beat's configuration:

monitoring.override_cluster_uuid:

By default, this setting is not set, that is, it's value is empty. In this case the value of the cluster_uuid field in the Beat's monitoring documents will be determined as follows:

If the Beat is using the elasticsearch output, the cluster_uuid will be that of the Elasticsearch cluster referenced by the elasticsearch output.
Else, the cluster_uuid will be blank, causing the Beat to show up under Standalone Cluster in the Stack Monitoring UI.

If the monitoring.override_cluster_uuid setting is given a value (i.e. the Cluster UUID of an Elasticsearch cluster), this value will be used as the value of the cluster_uuid field in the Beat's monitoring documents, regardless of the output being used by the Beat.

Testing this PR

For all test cases below, the same Elasticsearch query is to be run against your Monitoring Elasticsearch cluster. This is the query:

POST .monitoring-beats-*/_search
{
  "size": 2, 
  "_source": [
    "type",
    "cluster_uuid"
    ],
  "collapse": {
    "field": "type"
  },
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}

When running this query, note that it may take up to 30 seconds for type:beats_state documents to show up.

When `monitoring.override_cluster_uuid` is set

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is the same as that specified for the monitoring.override_cluster_uuid setting.

When `monitoring.override_cluster_uuid` is not set

When `output.elasticsearch` is enabled

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is the same as the cluster UUID of the Elasticsearch cluster referenced by the output.elasticsearch setting. To deteremine this value, call the GET / API against the output.elasticsearch Elasticsearch cluster.

When an output other than `output.elasticsearch` is enabled

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is null.

For testing, you can enable output.console. Make sure to disable output.elasticsearch and to point monitoring.elasticsearch.hosts to your Monitoring Elasticsearch cluster.

elasticmachine · 2019-08-06T19:13:06Z

Pinging @elastic/stack-monitoring

ycombinator · 2019-08-06T19:16:27Z

libbeat/_meta/config.reference.yml.tmpl

+# the Stack Monitoring UI. However, if a different output is enabled, the monitoring data will
+# show up under a Standalone Cluster in the UI. If you want the monitoring data to be associated
+# with a specific Elasticsearch cluster in the UI, specify that cluster's UUID here.
+#monitoring.override_cluster_uuid:


@dedemorton WDYT of this language?

I wouldn't start by talking saying, "If output.elasticsearch is enabled" because skim readers might think the section doesn't apply to them. Instead, I would start with a description of what the setting contains, then I'd describe the reasons for setting it. How about something like this:

Sets the UUID of the Elasticsearch cluster under which monitoring data for this {{.BeatName | title}} instance will appear in the Stack Monitoring UI. If output.elasticsearch is enabled, the UUID is set by default. If a different output is enabled, you must specify this setting, or monitoring data about this {{.BeatName | title}} instance will appear under a standalone cluster.

Maybe a bit wordy, tho. WDYT?

I can't recall a place in the public documentation right now where we discuss what a cluster UUID is and what its function is. I'm a bit on the fence over whether this case is common enough that we should document it more thoroughly but generally I learn toward the more documentation the better. :)

Either way, I think that we should also provide clear instructions to a user on retrieving a the cluster UUID for the cluster that they wish to associate a monitored component with.

@cachedout In addition to @dedemorton's suggested changes, I can add a sentence about how to determine the cluster UUID.

Given that this is an inline comment in a configuration file, I'm not sure there's space to talk about what a cluster UUID is and what it's function is — that seems like something that belongs in our online ES docs. If you agree then perhaps file a docs issue (or PR) in the ES repo for adding the information you'd like to see?

We can add more detail to the Beats docs where we talk about monitoring. IMO it's important to keep the description in the config file as brief as possible, or the file will become unreadable over time.

I wanted to add the override_ prefix to indicate more strongly that this will be the cluster UUID that is used when it is set, regardless of any other configuration. But I'm good with your suggestion as well, @urso. I think perhaps the name is not as critical as the inline comment and web site documentation about it.

:) Reason I ask for cluster_uuid is, cause I'd prefer it to be configured most of the time. It's the one to use for users not using the ES output always, meaning there is nothing to override. That is: we only 'derive' it from the ES output if available and not configured (ES output route as a fallback). In the end I don't mind as much about the exact naming, but agree the doc should be clear.
Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.

Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.

Good point. I will put up a separate PR to prevent this when output.elasticsearch is being used. I already did something like this for the new beat Metricbeat module: #13020.

Based on all the above feedback, I've updated the setting name and inline comment in 78b25d4d069f47681667b112f5d6547d09628f60.

I tried to keep the comment short and convey that we expect users to set this setting unless they are using output.elasticsearch, in which case we automatically derive the cluster UUID.

Please let me know what you think.

I will put up a separate PR to prevent this when output.elasticsearch is being used.

PR is up: #13251

ycombinator · 2019-08-09T13:12:07Z

Hi @elastic/beats developers, anyone available to review this PR?

abraxxa · 2019-08-13T13:18:43Z

Why not set the UUID automatically like now when the cluster hosts are the same or autodiscover it?

cachedout

LGTM pending the additional sentence for the docs discussed in a review comment. Should we also file a docs issue on enhancing the documentation to describe the role of a UUID, per the suggestion of @dedemorton ?

ycombinator · 2019-08-13T15:28:09Z

@abraxxa said:

Why not set the UUID automatically like now when the cluster hosts are the same or autodiscover it?

Indeed, if output.elasticsearch is being used there will be no need to set the monitoring.override_cluster_uuid setting. The Beat will discover the cluster UUID from the Elasticsearch cluster referenced by output.elasticsearch.

We expect users to use monitoring.override_cluster_uuid only if they are using an output other than output.elasticsearch but still want to associated the Beat in the Stack Monitoring UI with a specific Elasticsearch cluster.

ycombinator · 2019-08-13T15:45:46Z

@cachedout I'm waiting for someone from @elastic/beats to review this PR before taking any further action. I've noted @dedemorton's feedback and I'll take the necessary next steps once this PR has been reviewed by someone from @elastic/beats.

abraxxa · 2019-08-13T16:07:55Z

@ycombinator: we send filebeat logs to logstash for processing and have xpack.monitoring.elasticsearch configured.
Why can‘t monitoring.elasticsearch not behave exactly like xpack. now?

ycombinator · 2019-08-13T16:18:28Z

Why can‘t monitoring.elasticsearch not behave exactly like xpack. now?

xpack.monitoring.* would send monitoring data to an Elasticsearch "production" cluster, which would then export it to an Elasticsearch "monitoring" cluster. This monitoring cluster could be the same as the production cluster (so the exporter used would be a local one, which is the default) or the monitoring cluster could be a separate, dedicated one (so the exporter used on the production cluster would be a http one).

Routing monitoring data through the production cluster has disadvantages: it's unnecessarily complicated (the extra hop), it adds an unnecessary burden on the production cluster, and it requires maintenance of code in the production cluster just for supporting this routing. Instead, if we allow Beats to ship data directly to the monitoring cluster, those disadvantages go away. So this is the direction we are heading in.

However, when monitoring data is shipped through the production cluster (by using the xpack.monitoring.elasticsearch.* settings), the production cluster would inject its own cluster_uuid into the monitoring data. That would cause the Beat to be associated with that Elasticsearch production cluster in the Stack Monitoring UI in Kibana.

In the new way, when monitoring data is shipped directly to the monitoring cluster (by using the monitoring.elasticsearch.* settings), the Beat has no way of knowing the production cluster's ID or if there is even a production cluster in the picture! To allow for this possibility, this PR proposes adding a monitoring.override_cluster_uuid setting — users can optionally choose to set it to their production Elasticsearch cluster's UUID if they want the Beat's monitoring data to be associated with this cluster in the Stack Monitoring UI in Kibana.

If a user doesn't set this new setting, the Beat will use the cluster UUID of the Elasticsearch cluster pointed to by the Beat's output.elasticsearch.* settings (assuming, of course, that the user is using this output). If the user is not using the Elasticsearch output, then no Cluster UUID will be injected into the monitoring data for the Beat and it will be associated with a "Standalone Cluster" in the Stack Monitoring UI in Kibana.

we send filebeat logs to logstash for processing and have xpack.monitoring.elasticsearch configured.

So in your case you will most likely want to set monitoring.override_cluster_uuid to the cluster UUID of the Elasticsearch cluster currently referenced by xpack.monitoring.elasticsearch. This will ensure that you see the same association in the Stack Monitoring UI in Kibana.

libbeat/monitoring/monitoring.go

libbeat/monitoring/report/elasticsearch/elasticsearch.go

abraxxa · 2019-08-14T18:53:20Z

@ycombinator thanks for the detailed explanation, very much appreciated!

I wasn‘t aware that there is any exporter functionality in Elasticsearch towards another cluster.

Maybe there is a way for the Elasticsearch module of Filebeat to detect the UUID, by API call or by including it in the JSON logfiles.

ycombinator · 2019-08-15T20:53:43Z

@urso I've addressed all your feedback on this PR. Please take a look when you get a chance. Thanks!

dedemorton · 2019-08-20T22:46:21Z

auditbeat/auditbeat.reference.yml

@@ -1222,6 +1222,17 @@ logging.files:
 # Set to true to enable the monitoring reporter.
 #monitoring.enabled: false

+# If output.elasticsearch is enabled, monitoring data for this Auditbeat instance


Did you mean to include both statements? I think there is too much redundancy, and I'm still not happy with the wording (even the bits that I've contributed). I know you're eager to get this merged. I'd suggest removing the redundant wording (maybe even go with your original statement), and I can improve the wording when I add more info to the docs.

I did not, this might be the result of a bad rebase. Fixing...

@dedemorton I've removed the duplicate comment in 1e2585f. Thanks for catching this!

dedemorton

Doc changes LGTM

…13182) (#13480) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment

ruflin · 2019-09-16T09:43:52Z

@ycombinator I can see this in the 7.3 but not 7.4 branch. Is this expected?

ycombinator · 2019-09-16T12:48:47Z

@ruflin Nope, not expected. It should be in 7.4 as well. I was expecting that to happen when changes from master get mass-backported to 7.x, but I guess 7.x was already pointing to 7.5 by the time this PR was merged? Anyway, backport PR to 7.4 is here now: #13694

…13182) (#13694) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment

tanvp112 · 2020-06-24T15:57:27Z

May I know what is the definition of "derived" here? With 7.8.0 Filebeat did not set this value according to the cluster specified in output.elasticsearch.hosts nor the monitoring.elasticsearch.hosts

"Sets the UUID of the Elasticsearch cluster under which monitoring data for this
Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch."

monitoring.cluster_uuid:

If monitoring.cluster_uuid can only be set MANUALLY by query a LIVE elasticsearch cluster for its UUID, this behavior is complicating continuous deployment pipeline. Is it possible for Filebeat to seek for the cluster uuid automatically as specified in the monitoring.elasticsearch.hosts?

Also noticed this monitoring.* thingy has break process that forward logs to Logstash for grok'ing before sending data to Elasticsearch... for the time being holding on to xpack.monitoring.enabled until further clarification.

ycombinator · 2020-06-24T17:05:19Z

In the Stack Monitoring UI, data is grouped by Elasticsearch cluster. In more robust deployments users typically have a "production" Elasticsearch cluster and a separate, dedicated "monitoring" Elasticsearch cluster. The production cluster holds their regular business data whereas the monitoring cluster holds their Elastic Stack monitoring data. The idea behind having a dedicated monitoring cluster is resiliency: if the production cluster is having problems, you (and the Stack Monitoring UI) can still look at your monitoring data in the separate monitoring cluster.

When it comes to Beats, there are two possibilities to consider:

The Beat is using the Elasticsearch output. In this case, there is no need to set monitoring.cluster_uuid. The Beat will connect to the Elasticsearch cluster defined by elasticsearch.hosts, get the cluster UUID of that cluster and use it in that Beat's monitoring data.
The Beat is using some output other than Elasticsearch. In this case there's no way for the Beat to know the "production" Elasticsearch cluster's UUID. So it must be manually set with the monitoring.cluster_uuid setting.

With 7.8.0 Filebeat did not set this value according to the cluster specified in output.elasticsearch.hosts nor the monitoring.elasticsearch.hosts

This sounds like you are in case 1 above. Can you post a question about this on https://discuss.elastic.co/ including your complete Filebeat configuration and logs from the first minute or so after Filebeat start up? We prefer to keep discussions in GitHub for verified bugs or enhancement requests.

If monitoring.cluster_uuid can only be set MANUALLY by query a LIVE elasticsearch cluster for its UUID, this behavior is complicating continuous deployment pipeline. Is it possible for Filebeat to seek for the cluster uuid automatically as specified in the monitoring.elasticsearch.hosts?

As explained above the intent is for the cluster UUID to match that of the "production" cluster so the Beats monitoring data shows up correctly in the Stack Monitoring UI. The intent of the monitoring.elasticsearch.hosts setting is to point to the "monitoring" cluster. It's possible that both clusters are the one and the same, but there's no way of knowing that for sure - a user could alternatively be using a dedicated monitoring cluster along side a production cluster. So we ask the user to explicitly set the "production" Elasticsearch cluster's UUID via the monitoring.cluster_uuid setting.

abraxxa · 2020-06-24T18:49:04Z

Filebeat should know the cluster uuid of the cluster it is monitoring, for example by doing an API call on startup.

tanvp112 · 2020-06-25T04:58:33Z

@ycombinator, case 1 yes and was expecting "The Beat will connect to the Elasticsearch cluster defined by elasticsearch.hosts, get the cluster UUID of that cluster and use it in that Beat's monitoring data." to work but it is not happening. The idea of having dedicated, permanent cluster for monitoring is fine, BUT remind that every production cluster is often surrounded by many development clusters - these are cluster used by various teams and using a CD pipeline to automatic spin up and down; including cluster specific for SIT purposes - logs for these systems are not needed after the event and prefer to stay on the same cluster. Things get more complicated if logs are process by Logstash first and this monitoring.* is not helpful because cluster_uuid must be known first - but remind that these infra are up/down on-demand.

Quick spin to test:

Start ES/KB/FB on the same machine, uses BASIC license
Enabled FB module for ES and KB logs
Use stock filebeat.reference.yml
output.elasticsearch.enabled: true
output.elasticsearch.hosts: ["localhost"]
monitoring.enabled: true
**Leave all other settings as-is

Note not only an orphan cluster is created, the BASIC license was void as well.

Click into the actual cluster, all are working as expected:

Looking into ES log - no error reported
Looking into KB log - no error reported
Looking int FB log - no error reported. Didn't see any cluster_uuid reported as well. Is it possible to have this info in the log file so we can cross-check it?

Terminate all instances and restart with xpack.monitoring.enabled true and monitoring.enabled false (the ONLY change) - the cluster is up and running normal with NO orphan cluster created. However, an unusual kibana error start to appear periodically:

curl the active cluster and confirmed the UUID in question is NOT the active cluster uuid. Probably some kind of race condition among the applications during start-up about the cluster state...

…lastic#13182) (elastic#13694) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment

…lastic#13182) (elastic#13480) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment

ycombinator added enhancement review Feature:Stack Monitoring v8.0.0 v7.4.0 v7.3.1 labels Aug 6, 2019

ycombinator requested review from a team as code owners August 6, 2019 19:13

ycombinator commented Aug 6, 2019

View reviewed changes

ycombinator requested review from a team and removed request for a team August 6, 2019 19:24

cachedout approved these changes Aug 13, 2019

View reviewed changes

urso reviewed Aug 13, 2019

View reviewed changes

libbeat/monitoring/monitoring.go Show resolved Hide resolved

urso reviewed Aug 13, 2019

View reviewed changes

libbeat/monitoring/report/elasticsearch/elasticsearch.go Outdated Show resolved Hide resolved

ycombinator mentioned this pull request Aug 15, 2019

Expose monitoring.cluster_uuid in State API #13254

Merged

urso self-assigned this Aug 20, 2019

urso approved these changes Aug 20, 2019

View reviewed changes

ycombinator force-pushed the libbeat-monitoring-override-cluster-uuid branch from 740059e to 80a610c Compare August 20, 2019 19:21

ycombinator requested a review from a team as a code owner August 20, 2019 19:21

dedemorton reviewed Aug 20, 2019

View reviewed changes

dedemorton approved these changes Aug 21, 2019

View reviewed changes

ycombinator added 6 commits September 3, 2019 13:06

Use struct tags pattern instead of repeating magic string

70a0c95

Fixing logic after variable rename

d720bfc

Remove unnecessary check and override

1a50b40

Fixing duplicate comment

d02787f

Fixing comment

83d5637

Updating comment

6376f42

ycombinator force-pushed the libbeat-monitoring-override-cluster-uuid branch from 9ab945a to 6376f42 Compare September 3, 2019 20:06

ycombinator merged commit 0bafb43 into elastic:master Sep 3, 2019

ycombinator deleted the libbeat-monitoring-override-cluster-uuid branch September 3, 2019 21:40

ycombinator added v7.3.2 and removed v7.3.1 labels Sep 3, 2019

ycombinator mentioned this pull request Sep 3, 2019

[7.3] Add ability to override cluster_uuid to be used in monitoring data (#13182) #13480

Merged

ycombinator added the libbeat label Sep 3, 2019

This was referenced Sep 4, 2019

Fix typos + add CHANGELOG #13481

Merged

Fix documentation errors #13485

Closed

ycombinator mentioned this pull request Sep 16, 2019

[7.4] Add ability to override cluster_uuid to be used in monitoring data (#13182) #13694

Merged

ycombinator mentioned this pull request Oct 10, 2019

Document monitoring.cluster_uuid setting #13999

Merged

urso added the v7.5.0 label Oct 22, 2019

cachedout mentioned this pull request Dec 11, 2019

Logstash pipelines with Elasticsearch output can vanish from Monitoring app elastic/kibana#52245

Closed

anyasabo mentioned this pull request Jul 20, 2020

Enable beats monitoring elastic/cloud-on-k8s#3493

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to override cluster_uuid to be used in monitoring data #13182

Add ability to override cluster_uuid to be used in monitoring data #13182

ycombinator commented Aug 6, 2019 •

edited

Loading

elasticmachine commented Aug 6, 2019

ycombinator Aug 6, 2019

dedemorton Aug 6, 2019

cachedout Aug 8, 2019

ycombinator Aug 9, 2019

dedemorton Aug 9, 2019

ycombinator Aug 13, 2019

urso Aug 13, 2019 •

edited

Loading

ycombinator Aug 14, 2019

ycombinator Aug 14, 2019

ycombinator Aug 15, 2019

ycombinator commented Aug 9, 2019

abraxxa commented Aug 13, 2019

cachedout left a comment

ycombinator commented Aug 13, 2019

ycombinator commented Aug 13, 2019

abraxxa commented Aug 13, 2019

ycombinator commented Aug 13, 2019 •

edited

Loading

abraxxa commented Aug 14, 2019

ycombinator commented Aug 15, 2019

dedemorton Aug 20, 2019

ycombinator Aug 21, 2019

ycombinator Aug 21, 2019

dedemorton left a comment

ruflin commented Sep 16, 2019

ycombinator commented Sep 16, 2019

tanvp112 commented Jun 24, 2020 •

edited

Loading

ycombinator commented Jun 24, 2020

abraxxa commented Jun 24, 2020

tanvp112 commented Jun 25, 2020 •

edited

Loading

Add ability to override cluster_uuid to be used in monitoring data #13182

Add ability to override cluster_uuid to be used in monitoring data #13182

Conversation

ycombinator commented Aug 6, 2019 • edited Loading

Background and Problem

Solution

Testing this PR

When monitoring.override_cluster_uuid is set

When monitoring.override_cluster_uuid is not set

When output.elasticsearch is enabled

When an output other than output.elasticsearch is enabled

elasticmachine commented Aug 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Aug 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Aug 9, 2019

abraxxa commented Aug 13, 2019

cachedout left a comment

Choose a reason for hiding this comment

ycombinator commented Aug 13, 2019

ycombinator commented Aug 13, 2019

abraxxa commented Aug 13, 2019

ycombinator commented Aug 13, 2019 • edited Loading

abraxxa commented Aug 14, 2019

ycombinator commented Aug 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dedemorton left a comment

Choose a reason for hiding this comment

ruflin commented Sep 16, 2019

ycombinator commented Sep 16, 2019

tanvp112 commented Jun 24, 2020 • edited Loading

ycombinator commented Jun 24, 2020

abraxxa commented Jun 24, 2020

tanvp112 commented Jun 25, 2020 • edited Loading

ycombinator commented Aug 6, 2019 •

edited

Loading

When `monitoring.override_cluster_uuid` is set

When `monitoring.override_cluster_uuid` is not set

When `output.elasticsearch` is enabled

When an output other than `output.elasticsearch` is enabled

urso Aug 13, 2019 •

edited

Loading

ycombinator commented Aug 13, 2019 •

edited

Loading

tanvp112 commented Jun 24, 2020 •

edited

Loading

tanvp112 commented Jun 25, 2020 •

edited

Loading