Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to override cluster_uuid to be used in monitoring data #13182

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Aug 6, 2019

Background and Problem

Starting 7.2.0, the xpack.monitoring.* settings in Beats are deprecated in favor of monitoring.* settings. The former were used to define the production cluster to which the Beat should send its monitoring data. The latter are used to define the monitoring cluster to which the Beat should directly send its monitoring data.

When the monitoring.* settings are used with an output other than elasticsearch, there's no way (currently) to know if the Beat's "regular" (i.e. non-monitoring) data is going to end up in an Elasticsearch cluster. As such, we cannot associate the Beat with an Elasticsearch cluster in the Stack Monitoring UI. Instead we show it under a "Standalone Cluster".

Prior to 7.2.0, if users were using an output other than elasticsearch, they would still send monitoring data via the production cluster (by using the xpack.monitoring.* settings). As such, the production cluster could enrich the monitoring data such that the Stack Monitoring UI would show the Beat associated with that production Elasticsearch cluster.

Starting 7.2.0 instead, the same Beat has now "moved" to a Standalone Cluster in the UI and users are not happy about it, specifically users who know that their Beats are eventually sending their regular data into Elasticsearch. Such users would like to see the Beat associated with their production Elasticsearch cluster in the UI.

Solution

This PR aims to solve the above problem by providing said class of users with a new setting in their Beat's configuration:

monitoring.override_cluster_uuid:

By default, this setting is not set, that is, it's value is empty. In this case the value of the cluster_uuid field in the Beat's monitoring documents will be determined as follows:

  • If the Beat is using the elasticsearch output, the cluster_uuid will be that of the Elasticsearch cluster referenced by the elasticsearch output.
  • Else, the cluster_uuid will be blank, causing the Beat to show up under Standalone Cluster in the Stack Monitoring UI.

If the monitoring.override_cluster_uuid setting is given a value (i.e. the Cluster UUID of an Elasticsearch cluster), this value will be used as the value of the cluster_uuid field in the Beat's monitoring documents, regardless of the output being used by the Beat.

Testing this PR

For all test cases below, the same Elasticsearch query is to be run against your Monitoring Elasticsearch cluster. This is the query:

POST .monitoring-beats-*/_search
{
  "size": 2, 
  "_source": [
    "type",
    "cluster_uuid"
    ],
  "collapse": {
    "field": "type"
  },
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}

When running this query, note that it may take up to 30 seconds for type:beats_state documents to show up.

When monitoring.override_cluster_uuid is set

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is the same as that specified for the monitoring.override_cluster_uuid setting.

When monitoring.override_cluster_uuid is not set

When output.elasticsearch is enabled

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is the same as the cluster UUID of the Elasticsearch cluster referenced by the output.elasticsearch setting. To deteremine this value, call the GET / API against the output.elasticsearch Elasticsearch cluster.

When an output other than output.elasticsearch is enabled

Verify that the value of the cluster_uuid field in .monitoring-beats-* documents of type:beats_stats as well as type:beats_state is null.

For testing, you can enable output.console. Make sure to disable output.elasticsearch and to point monitoring.elasticsearch.hosts to your Monitoring Elasticsearch cluster.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring

# the Stack Monitoring UI. However, if a different output is enabled, the monitoring data will
# show up under a Standalone Cluster in the UI. If you want the monitoring data to be associated
# with a specific Elasticsearch cluster in the UI, specify that cluster's UUID here.
#monitoring.override_cluster_uuid:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedemorton WDYT of this language?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't start by talking saying, "If output.elasticsearch is enabled" because skim readers might think the section doesn't apply to them. Instead, I would start with a description of what the setting contains, then I'd describe the reasons for setting it. How about something like this:

Sets the UUID of the Elasticsearch cluster under which monitoring data for this
{{.BeatName | title}} instance will appear in the Stack Monitoring UI. If output.elasticsearch
is enabled, the UUID is set by default. If a different output is enabled, you must specify this
setting, or monitoring data about this {{.BeatName | title}} instance will appear under a
standalone cluster.

Maybe a bit wordy, tho. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall a place in the public documentation right now where we discuss what a cluster UUID is and what its function is. I'm a bit on the fence over whether this case is common enough that we should document it more thoroughly but generally I learn toward the more documentation the better. :)

Either way, I think that we should also provide clear instructions to a user on retrieving a the cluster UUID for the cluster that they wish to associate a monitored component with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cachedout In addition to @dedemorton's suggested changes, I can add a sentence about how to determine the cluster UUID.

Given that this is an inline comment in a configuration file, I'm not sure there's space to talk about what a cluster UUID is and what it's function is — that seems like something that belongs in our online ES docs. If you agree then perhaps file a docs issue (or PR) in the ES repo for adding the information you'd like to see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more detail to the Beats docs where we talk about monitoring. IMO it's important to keep the description in the config file as brief as possible, or the file will become unreadable over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add the override_ prefix to indicate more strongly that this will be the cluster UUID that is used when it is set, regardless of any other configuration. But I'm good with your suggestion as well, @urso. I think perhaps the name is not as critical as the inline comment and web site documentation about it.

Copy link

@urso urso Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Reason I ask for cluster_uuid is, cause I'd prefer it to be configured most of the time. It's the one to use for users not using the ES output always, meaning there is nothing to override. That is: we only 'derive' it from the ES output if available and not configured (ES output route as a fallback). In the end I don't mind as much about the exact naming, but agree the doc should be clear.
Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.

Good point. I will put up a separate PR to prevent this when output.elasticsearch is being used. I already did something like this for the new beat Metricbeat module: #13020.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on all the above feedback, I've updated the setting name and inline comment in 78b25d4d069f47681667b112f5d6547d09628f60.

I tried to keep the comment short and convey that we expect users to set this setting unless they are using output.elasticsearch, in which case we automatically derive the cluster UUID.

Please let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put up a separate PR to prevent this when output.elasticsearch is being used.

PR is up: #13251

@ycombinator ycombinator requested review from a team and removed request for a team August 6, 2019 19:24
@ycombinator
Copy link
Contributor Author

Hi @elastic/beats developers, anyone available to review this PR?

@abraxxa
Copy link

abraxxa commented Aug 13, 2019

Why not set the UUID automatically like now when the cluster hosts are the same or autodiscover it?

Copy link
Contributor

@cachedout cachedout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending the additional sentence for the docs discussed in a review comment. Should we also file a docs issue on enhancing the documentation to describe the role of a UUID, per the suggestion of @dedemorton ?

@ycombinator
Copy link
Contributor Author

@abraxxa said:

Why not set the UUID automatically like now when the cluster hosts are the same or autodiscover it?

Indeed, if output.elasticsearch is being used there will be no need to set the monitoring.override_cluster_uuid setting. The Beat will discover the cluster UUID from the Elasticsearch cluster referenced by output.elasticsearch.

We expect users to use monitoring.override_cluster_uuid only if they are using an output other than output.elasticsearch but still want to associated the Beat in the Stack Monitoring UI with a specific Elasticsearch cluster.

@ycombinator
Copy link
Contributor Author

@cachedout I'm waiting for someone from @elastic/beats to review this PR before taking any further action. I've noted @dedemorton's feedback and I'll take the necessary next steps once this PR has been reviewed by someone from @elastic/beats.

@abraxxa
Copy link

abraxxa commented Aug 13, 2019

@ycombinator: we send filebeat logs to logstash for processing and have xpack.monitoring.elasticsearch configured.
Why can‘t monitoring.elasticsearch not behave exactly like xpack. now?

@ycombinator
Copy link
Contributor Author

ycombinator commented Aug 13, 2019

Why can‘t monitoring.elasticsearch not behave exactly like xpack. now?

xpack.monitoring.* would send monitoring data to an Elasticsearch "production" cluster, which would then export it to an Elasticsearch "monitoring" cluster. This monitoring cluster could be the same as the production cluster (so the exporter used would be a local one, which is the default) or the monitoring cluster could be a separate, dedicated one (so the exporter used on the production cluster would be a http one).

Routing monitoring data through the production cluster has disadvantages: it's unnecessarily complicated (the extra hop), it adds an unnecessary burden on the production cluster, and it requires maintenance of code in the production cluster just for supporting this routing. Instead, if we allow Beats to ship data directly to the monitoring cluster, those disadvantages go away. So this is the direction we are heading in.

However, when monitoring data is shipped through the production cluster (by using the xpack.monitoring.elasticsearch.* settings), the production cluster would inject its own cluster_uuid into the monitoring data. That would cause the Beat to be associated with that Elasticsearch production cluster in the Stack Monitoring UI in Kibana.

In the new way, when monitoring data is shipped directly to the monitoring cluster (by using the monitoring.elasticsearch.* settings), the Beat has no way of knowing the production cluster's ID or if there is even a production cluster in the picture! To allow for this possibility, this PR proposes adding a monitoring.override_cluster_uuid setting — users can optionally choose to set it to their production Elasticsearch cluster's UUID if they want the Beat's monitoring data to be associated with this cluster in the Stack Monitoring UI in Kibana.

If a user doesn't set this new setting, the Beat will use the cluster UUID of the Elasticsearch cluster pointed to by the Beat's output.elasticsearch.* settings (assuming, of course, that the user is using this output). If the user is not using the Elasticsearch output, then no Cluster UUID will be injected into the monitoring data for the Beat and it will be associated with a "Standalone Cluster" in the Stack Monitoring UI in Kibana.

we send filebeat logs to logstash for processing and have xpack.monitoring.elasticsearch configured.

So in your case you will most likely want to set monitoring.override_cluster_uuid to the cluster UUID of the Elasticsearch cluster currently referenced by xpack.monitoring.elasticsearch. This will ensure that you see the same association in the Stack Monitoring UI in Kibana.

@abraxxa
Copy link

abraxxa commented Aug 14, 2019

@ycombinator thanks for the detailed explanation, very much appreciated!

I wasn‘t aware that there is any exporter functionality in Elasticsearch towards another cluster.

Maybe there is a way for the Elasticsearch module of Filebeat to detect the UUID, by API call or by including it in the JSON logfiles.

@ycombinator
Copy link
Contributor Author

@urso I've addressed all your feedback on this PR. Please take a look when you get a chance. Thanks!

@urso urso self-assigned this Aug 20, 2019
@ycombinator ycombinator force-pushed the libbeat-monitoring-override-cluster-uuid branch from 740059e to 80a610c Compare August 20, 2019 19:21
@ycombinator ycombinator requested a review from a team as a code owner August 20, 2019 19:21
@@ -1222,6 +1222,17 @@ logging.files:
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# If output.elasticsearch is enabled, monitoring data for this Auditbeat instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to include both statements? I think there is too much redundancy, and I'm still not happy with the wording (even the bits that I've contributed). I know you're eager to get this merged. I'd suggest removing the redundant wording (maybe even go with your original statement), and I can improve the wording when I add more info to the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not, this might be the result of a bad rebase. Fixing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dedemorton I've removed the duplicate comment in 1e2585f. Thanks for catching this!

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes LGTM

@ycombinator ycombinator force-pushed the libbeat-monitoring-override-cluster-uuid branch from 9ab945a to 6376f42 Compare September 3, 2019 20:06
@ycombinator ycombinator merged commit 0bafb43 into elastic:master Sep 3, 2019
@ycombinator ycombinator deleted the libbeat-monitoring-override-cluster-uuid branch September 3, 2019 21:40
@ycombinator ycombinator added v7.3.2 and removed v7.3.1 labels Sep 3, 2019
This was referenced Sep 4, 2019
ycombinator added a commit that referenced this pull request Sep 4, 2019
…13182) (#13480)

* Add ability to override cluster_uuid to be used in monitoring data

* Removing changes committed by accident

* Updating setting name and inline comments in reference config file

* Updating setting name in code

* Use struct tags pattern instead of repeating magic string

* Fixing logic after variable rename

* Remove unnecessary check and override

* Fixing duplicate comment

* Fixing comment

* Updating comment
@ruflin
Copy link
Member

ruflin commented Sep 16, 2019

@ycombinator I can see this in the 7.3 but not 7.4 branch. Is this expected?

@ycombinator
Copy link
Contributor Author

@ruflin Nope, not expected. It should be in 7.4 as well. I was expecting that to happen when changes from master get mass-backported to 7.x, but I guess 7.x was already pointing to 7.5 by the time this PR was merged? Anyway, backport PR to 7.4 is here now: #13694

ycombinator added a commit that referenced this pull request Sep 17, 2019
…13182) (#13694)

* Add ability to override cluster_uuid to be used in monitoring data

* Removing changes committed by accident

* Updating setting name and inline comments in reference config file

* Updating setting name in code

* Use struct tags pattern instead of repeating magic string

* Fixing logic after variable rename

* Remove unnecessary check and override

* Fixing duplicate comment

* Fixing comment

* Updating comment
@urso urso added the v7.5.0 label Oct 22, 2019
@tanvp112
Copy link

tanvp112 commented Jun 24, 2020

May I know what is the definition of "derived" here? With 7.8.0 Filebeat did not set this value according to the cluster specified in output.elasticsearch.hosts nor the monitoring.elasticsearch.hosts

"Sets the UUID of the Elasticsearch cluster under which monitoring data for this
Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch
."

monitoring.cluster_uuid:

If monitoring.cluster_uuid can only be set MANUALLY by query a LIVE elasticsearch cluster for its UUID, this behavior is complicating continuous deployment pipeline. Is it possible for Filebeat to seek for the cluster uuid automatically as specified in the monitoring.elasticsearch.hosts?

Also noticed this monitoring.* thingy has break process that forward logs to Logstash for grok'ing before sending data to Elasticsearch... for the time being holding on to xpack.monitoring.enabled until further clarification.

@ycombinator
Copy link
Contributor Author

In the Stack Monitoring UI, data is grouped by Elasticsearch cluster. In more robust deployments users typically have a "production" Elasticsearch cluster and a separate, dedicated "monitoring" Elasticsearch cluster. The production cluster holds their regular business data whereas the monitoring cluster holds their Elastic Stack monitoring data. The idea behind having a dedicated monitoring cluster is resiliency: if the production cluster is having problems, you (and the Stack Monitoring UI) can still look at your monitoring data in the separate monitoring cluster.

When it comes to Beats, there are two possibilities to consider:

  1. The Beat is using the Elasticsearch output. In this case, there is no need to set monitoring.cluster_uuid. The Beat will connect to the Elasticsearch cluster defined by elasticsearch.hosts, get the cluster UUID of that cluster and use it in that Beat's monitoring data.
  2. The Beat is using some output other than Elasticsearch. In this case there's no way for the Beat to know the "production" Elasticsearch cluster's UUID. So it must be manually set with the monitoring.cluster_uuid setting.

With 7.8.0 Filebeat did not set this value according to the cluster specified in output.elasticsearch.hosts nor the monitoring.elasticsearch.hosts

This sounds like you are in case 1 above. Can you post a question about this on https://discuss.elastic.co/ including your complete Filebeat configuration and logs from the first minute or so after Filebeat start up? We prefer to keep discussions in GitHub for verified bugs or enhancement requests.

If monitoring.cluster_uuid can only be set MANUALLY by query a LIVE elasticsearch cluster for its UUID, this behavior is complicating continuous deployment pipeline. Is it possible for Filebeat to seek for the cluster uuid automatically as specified in the monitoring.elasticsearch.hosts?

As explained above the intent is for the cluster UUID to match that of the "production" cluster so the Beats monitoring data shows up correctly in the Stack Monitoring UI. The intent of the monitoring.elasticsearch.hosts setting is to point to the "monitoring" cluster. It's possible that both clusters are the one and the same, but there's no way of knowing that for sure - a user could alternatively be using a dedicated monitoring cluster along side a production cluster. So we ask the user to explicitly set the "production" Elasticsearch cluster's UUID via the monitoring.cluster_uuid setting.

@abraxxa
Copy link

abraxxa commented Jun 24, 2020

Filebeat should know the cluster uuid of the cluster it is monitoring, for example by doing an API call on startup.

@tanvp112
Copy link

tanvp112 commented Jun 25, 2020

@ycombinator, case 1 yes and was expecting "The Beat will connect to the Elasticsearch cluster defined by elasticsearch.hosts, get the cluster UUID of that cluster and use it in that Beat's monitoring data." to work but it is not happening. The idea of having dedicated, permanent cluster for monitoring is fine, BUT remind that every production cluster is often surrounded by many development clusters - these are cluster used by various teams and using a CD pipeline to automatic spin up and down; including cluster specific for SIT purposes - logs for these systems are not needed after the event and prefer to stay on the same cluster. Things get more complicated if logs are process by Logstash first and this monitoring.* is not helpful because cluster_uuid must be known first - but remind that these infra are up/down on-demand.

Quick spin to test:

  • Start ES/KB/FB on the same machine, uses BASIC license
  • Enabled FB module for ES and KB logs
  • Use stock filebeat.reference.yml
    output.elasticsearch.enabled: true
    output.elasticsearch.hosts: ["localhost"]
    monitoring.enabled: true
    **Leave all other settings as-is

Note not only an orphan cluster is created, the BASIC license was void as well.

image

Click into the actual cluster, all are working as expected:

image

Looking into ES log - no error reported
Looking into KB log - no error reported
Looking int FB log - no error reported. Didn't see any cluster_uuid reported as well. Is it possible to have this info in the log file so we can cross-check it?

Terminate all instances and restart with xpack.monitoring.enabled true and monitoring.enabled false (the ONLY change) - the cluster is up and running normal with NO orphan cluster created. However, an unusual kibana error start to appear periodically:

image

curl the active cluster and confirmed the UUID in question is NOT the active cluster uuid. Probably some kind of race condition among the applications during start-up about the cluster state...

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…lastic#13182) (elastic#13694)

* Add ability to override cluster_uuid to be used in monitoring data

* Removing changes committed by accident

* Updating setting name and inline comments in reference config file

* Updating setting name in code

* Use struct tags pattern instead of repeating magic string

* Fixing logic after variable rename

* Remove unnecessary check and override

* Fixing duplicate comment

* Fixing comment

* Updating comment
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…lastic#13182) (elastic#13480)

* Add ability to override cluster_uuid to be used in monitoring data

* Removing changes committed by accident

* Updating setting name and inline comments in reference config file

* Updating setting name in code

* Use struct tags pattern instead of repeating magic string

* Fixing logic after variable rename

* Remove unnecessary check and override

* Fixing duplicate comment

* Fixing comment

* Updating comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants