-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to override cluster_uuid to be used in monitoring data #13182
Add ability to override cluster_uuid to be used in monitoring data #13182
Conversation
Pinging @elastic/stack-monitoring |
# the Stack Monitoring UI. However, if a different output is enabled, the monitoring data will | ||
# show up under a Standalone Cluster in the UI. If you want the monitoring data to be associated | ||
# with a specific Elasticsearch cluster in the UI, specify that cluster's UUID here. | ||
#monitoring.override_cluster_uuid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dedemorton WDYT of this language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't start by talking saying, "If output.elasticsearch is enabled" because skim readers might think the section doesn't apply to them. Instead, I would start with a description of what the setting contains, then I'd describe the reasons for setting it. How about something like this:
Sets the UUID of the Elasticsearch cluster under which monitoring data for this
{{.BeatName | title}} instance will appear in the Stack Monitoring UI. If output.elasticsearch
is enabled, the UUID is set by default. If a different output is enabled, you must specify this
setting, or monitoring data about this {{.BeatName | title}} instance will appear under a
standalone cluster.
Maybe a bit wordy, tho. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't recall a place in the public documentation right now where we discuss what a cluster UUID is and what its function is. I'm a bit on the fence over whether this case is common enough that we should document it more thoroughly but generally I learn toward the more documentation the better. :)
Either way, I think that we should also provide clear instructions to a user on retrieving a the cluster UUID for the cluster that they wish to associate a monitored component with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cachedout In addition to @dedemorton's suggested changes, I can add a sentence about how to determine the cluster UUID.
Given that this is an inline comment in a configuration file, I'm not sure there's space to talk about what a cluster UUID is and what it's function is — that seems like something that belongs in our online ES docs. If you agree then perhaps file a docs issue (or PR) in the ES repo for adding the information you'd like to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add more detail to the Beats docs where we talk about monitoring. IMO it's important to keep the description in the config file as brief as possible, or the file will become unreadable over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add the override_
prefix to indicate more strongly that this will be the cluster UUID that is used when it is set, regardless of any other configuration. But I'm good with your suggestion as well, @urso. I think perhaps the name is not as critical as the inline comment and web site documentation about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Reason I ask for cluster_uuid is, cause I'd prefer it to be configured most of the time. It's the one to use for users not using the ES output always, meaning there is nothing to override. That is: we only 'derive' it from the ES output if available and not configured (ES output route as a fallback). In the end I don't mind as much about the exact naming, but agree the doc should be clear.
Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the code how we get the UUID from the ES output I wouldn't be surprised if we publish a few events without a cluster uuid at all.
Good point. I will put up a separate PR to prevent this when output.elasticsearch
is being used. I already did something like this for the new beat
Metricbeat module: #13020.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on all the above feedback, I've updated the setting name and inline comment in 78b25d4d069f47681667b112f5d6547d09628f60.
I tried to keep the comment short and convey that we expect users to set this setting unless they are using output.elasticsearch
, in which case we automatically derive the cluster UUID.
Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will put up a separate PR to prevent this when output.elasticsearch is being used.
PR is up: #13251
Hi @elastic/beats developers, anyone available to review this PR? |
Why not set the UUID automatically like now when the cluster hosts are the same or autodiscover it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending the additional sentence for the docs discussed in a review comment. Should we also file a docs issue on enhancing the documentation to describe the role of a UUID, per the suggestion of @dedemorton ?
@abraxxa said:
Indeed, if We expect users to use |
@cachedout I'm waiting for someone from @elastic/beats to review this PR before taking any further action. I've noted @dedemorton's feedback and I'll take the necessary next steps once this PR has been reviewed by someone from @elastic/beats. |
@ycombinator: we send filebeat logs to logstash for processing and have xpack.monitoring.elasticsearch configured. |
Routing monitoring data through the production cluster has disadvantages: it's unnecessarily complicated (the extra hop), it adds an unnecessary burden on the production cluster, and it requires maintenance of code in the production cluster just for supporting this routing. Instead, if we allow Beats to ship data directly to the monitoring cluster, those disadvantages go away. So this is the direction we are heading in. However, when monitoring data is shipped through the production cluster (by using the In the new way, when monitoring data is shipped directly to the monitoring cluster (by using the If a user doesn't set this new setting, the Beat will use the cluster UUID of the Elasticsearch cluster pointed to by the Beat's
So in your case you will most likely want to set |
@ycombinator thanks for the detailed explanation, very much appreciated! I wasn‘t aware that there is any exporter functionality in Elasticsearch towards another cluster. Maybe there is a way for the Elasticsearch module of Filebeat to detect the UUID, by API call or by including it in the JSON logfiles. |
@urso I've addressed all your feedback on this PR. Please take a look when you get a chance. Thanks! |
740059e
to
80a610c
Compare
auditbeat/auditbeat.reference.yml
Outdated
@@ -1222,6 +1222,17 @@ logging.files: | |||
# Set to true to enable the monitoring reporter. | |||
#monitoring.enabled: false | |||
|
|||
# If output.elasticsearch is enabled, monitoring data for this Auditbeat instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to include both statements? I think there is too much redundancy, and I'm still not happy with the wording (even the bits that I've contributed). I know you're eager to get this merged. I'd suggest removing the redundant wording (maybe even go with your original statement), and I can improve the wording when I add more info to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not, this might be the result of a bad rebase. Fixing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dedemorton I've removed the duplicate comment in 1e2585f. Thanks for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc changes LGTM
9ab945a
to
6376f42
Compare
…13182) (#13480) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment
@ycombinator I can see this in the 7.3 but not 7.4 branch. Is this expected? |
…13182) (#13694) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment
May I know what is the definition of "derived" here? With 7.8.0 Filebeat did not set this value according to the cluster specified in output.elasticsearch.hosts nor the monitoring.elasticsearch.hosts "Sets the UUID of the Elasticsearch cluster under which monitoring data for this monitoring.cluster_uuid: If monitoring.cluster_uuid can only be set MANUALLY by query a LIVE elasticsearch cluster for its UUID, this behavior is complicating continuous deployment pipeline. Is it possible for Filebeat to seek for the cluster uuid automatically as specified in the monitoring.elasticsearch.hosts? Also noticed this monitoring.* thingy has break process that forward logs to Logstash for grok'ing before sending data to Elasticsearch... for the time being holding on to xpack.monitoring.enabled until further clarification. |
In the Stack Monitoring UI, data is grouped by Elasticsearch cluster. In more robust deployments users typically have a "production" Elasticsearch cluster and a separate, dedicated "monitoring" Elasticsearch cluster. The production cluster holds their regular business data whereas the monitoring cluster holds their Elastic Stack monitoring data. The idea behind having a dedicated monitoring cluster is resiliency: if the production cluster is having problems, you (and the Stack Monitoring UI) can still look at your monitoring data in the separate monitoring cluster. When it comes to Beats, there are two possibilities to consider:
This sounds like you are in case 1 above. Can you post a question about this on https://discuss.elastic.co/ including your complete Filebeat configuration and logs from the first minute or so after Filebeat start up? We prefer to keep discussions in GitHub for verified bugs or enhancement requests.
As explained above the intent is for the cluster UUID to match that of the "production" cluster so the Beats monitoring data shows up correctly in the Stack Monitoring UI. The intent of the |
Filebeat should know the cluster uuid of the cluster it is monitoring, for example by doing an API call on startup. |
@ycombinator, case 1 yes and was expecting "The Beat will connect to the Elasticsearch cluster defined by elasticsearch.hosts, get the cluster UUID of that cluster and use it in that Beat's monitoring data." to work but it is not happening. The idea of having dedicated, permanent cluster for monitoring is fine, BUT remind that every production cluster is often surrounded by many development clusters - these are cluster used by various teams and using a CD pipeline to automatic spin up and down; including cluster specific for SIT purposes - logs for these systems are not needed after the event and prefer to stay on the same cluster. Things get more complicated if logs are process by Logstash first and this monitoring.* is not helpful because cluster_uuid must be known first - but remind that these infra are up/down on-demand. Quick spin to test:
Note not only an orphan cluster is created, the BASIC license was void as well. Click into the actual cluster, all are working as expected: Looking into ES log - no error reported Terminate all instances and restart with xpack.monitoring.enabled true and monitoring.enabled false (the ONLY change) - the cluster is up and running normal with NO orphan cluster created. However, an unusual kibana error start to appear periodically: curl the active cluster and confirmed the UUID in question is NOT the active cluster uuid. Probably some kind of race condition among the applications during start-up about the cluster state... |
…lastic#13182) (elastic#13694) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment
…lastic#13182) (elastic#13480) * Add ability to override cluster_uuid to be used in monitoring data * Removing changes committed by accident * Updating setting name and inline comments in reference config file * Updating setting name in code * Use struct tags pattern instead of repeating magic string * Fixing logic after variable rename * Remove unnecessary check and override * Fixing duplicate comment * Fixing comment * Updating comment
Background and Problem
Starting 7.2.0, the
xpack.monitoring.*
settings in Beats are deprecated in favor ofmonitoring.*
settings. The former were used to define the production cluster to which the Beat should send its monitoring data. The latter are used to define the monitoring cluster to which the Beat should directly send its monitoring data.When the
monitoring.*
settings are used with an output other thanelasticsearch
, there's no way (currently) to know if the Beat's "regular" (i.e. non-monitoring) data is going to end up in an Elasticsearch cluster. As such, we cannot associate the Beat with an Elasticsearch cluster in the Stack Monitoring UI. Instead we show it under a "Standalone Cluster".Prior to 7.2.0, if users were using an output other than
elasticsearch
, they would still send monitoring data via the production cluster (by using thexpack.monitoring.*
settings). As such, the production cluster could enrich the monitoring data such that the Stack Monitoring UI would show the Beat associated with that production Elasticsearch cluster.Starting 7.2.0 instead, the same Beat has now "moved" to a Standalone Cluster in the UI and users are not happy about it, specifically users who know that their Beats are eventually sending their regular data into Elasticsearch. Such users would like to see the Beat associated with their production Elasticsearch cluster in the UI.
Solution
This PR aims to solve the above problem by providing said class of users with a new setting in their Beat's configuration:
By default, this setting is not set, that is, it's value is empty. In this case the value of the
cluster_uuid
field in the Beat's monitoring documents will be determined as follows:elasticsearch
output, thecluster_uuid
will be that of the Elasticsearch cluster referenced by theelasticsearch
output.cluster_uuid
will be blank, causing the Beat to show up under Standalone Cluster in the Stack Monitoring UI.If the
monitoring.override_cluster_uuid
setting is given a value (i.e. the Cluster UUID of an Elasticsearch cluster), this value will be used as the value of thecluster_uuid
field in the Beat's monitoring documents, regardless of the output being used by the Beat.Testing this PR
For all test cases below, the same Elasticsearch query is to be run against your Monitoring Elasticsearch cluster. This is the query:
When running this query, note that it may take up to 30 seconds for
type:beats_state
documents to show up.When
monitoring.override_cluster_uuid
is setVerify that the value of the
cluster_uuid
field in.monitoring-beats-*
documents oftype:beats_stats
as well astype:beats_state
is the same as that specified for themonitoring.override_cluster_uuid
setting.When
monitoring.override_cluster_uuid
is not setWhen
output.elasticsearch
is enabledVerify that the value of the
cluster_uuid
field in.monitoring-beats-*
documents oftype:beats_stats
as well astype:beats_state
is the same as the cluster UUID of the Elasticsearch cluster referenced by theoutput.elasticsearch
setting. To deteremine this value, call theGET /
API against theoutput.elasticsearch
Elasticsearch cluster.When an output other than
output.elasticsearch
is enabledVerify that the value of the
cluster_uuid
field in.monitoring-beats-*
documents oftype:beats_stats
as well astype:beats_state
isnull
.For testing, you can enable
output.console
. Make sure to disableoutput.elasticsearch
and to pointmonitoring.elasticsearch.hosts
to your Monitoring Elasticsearch cluster.