-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjustments to send monitoring data directly to ES #11541
Adjustments to send monitoring data directly to ES #11541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andsel The data shape changes make sense, but I think we need to make sure we are consistent with the rest of the stack in how we expose the direct shipping option to users.
config/logstash.yml
Outdated
@@ -236,6 +236,7 @@ | |||
#xpack.monitoring.elasticsearch.sniffing: false | |||
#xpack.monitoring.collection.interval: 10s | |||
#xpack.monitoring.collection.pipeline.details.enabled: true | |||
#xpack.monitoring.collection.write_direct.enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is what we want here - if we are to be consistent with beats, they introduced a new class of settings under monitoring.*
for their direct shipping feature, which is also called out in the meta issue for the migration to Metricbeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with you @robbavey, I would also change the name from write_direct
to direct_shipping
, but I have some doubts to clear up:
- to start monitoring we have the
xpack.monitoring.enabled
settings that has to be enabled to use the internal collectors. So we could potentially have amonitoring.collection.direct_shipping.enabled
(more specific) that require a more generalxpack.monitoring.enabled
to work. But xpack is an extension pack, so we would have something inmonitoring.*
path the require something form extension packxpack.monitoring.*
to work. I would expect that both stay at the same level, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at what the Beats team did - they have three options for monitoring, Legacy internal pointing to production ES cluster, regular internal monitoring pointing to monitoring ES cluster and Metricbeat collection.
How they appear to have structured it, is to have the Legacy monitoring being configured by settings in the xpack.monitoring
namespace, and the 'direct' monitoring being configured by settings in the monitoring
namespace, rather than having a specific setting for direct_shipping
. To enable legacy monitoring, xpack.monitoring.enabled
would be used, and to use direct monitoring, monitoring.enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, deprecating the Legacy they redirect to the new and removed the documentation for the settings redirecting to the new internal collectors for reference. So we could:
- keep the handling of old
xpack.monitoring
settings for back compatibility - copy the old setting in
xpack.monitoring.*
to themonitoring.*
wheremonitoring.enabled
means "use the ship direct way" - adapt the documentation to reflect this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rename makes sense - @chrisronline, @ycombinator, does this make sense to you, and is it consistent with the rest of the stack in terms of adding direct to monitoring cluster functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, monitoring.*
settings is exactly what we're pushing for consistency across the stack 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the general structure of the settings that were added, but before committing, I think a few things should be added in addition to those mentioned in the review:
I'd like to see an integration test with the new shape of data being sent to the _bulk
endpoint in Elasticsearch.
I also think there should be a test for - that would at least log, but probably disallow - configurations that have set values for monitoring.*
and xpack.monitoring.*
in their logstash.yml
config/logstash.yml
Outdated
@@ -216,6 +216,23 @@ | |||
# Default is false | |||
# pipeline.separate_logs: false | |||
# | |||
# Enables internal collector to shipping directly to Elasticsearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change this to something like 'Enable monitoring via internal collector to an Elasticsearch monitoring cluster`
#monitoring.elasticsearch.ssl.verification_mode: certificate | ||
#monitoring.elasticsearch.sniffing: false | ||
#monitoring.collection.interval: 10s | ||
#monitoring.collection.pipeline.details.enabled: true | ||
# ------------ X-Pack Settings (not applicable for OSS build)-------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to consider removing the xpack
settings here, and just have the monitoring
settings. We should also make a note to surface this in release notes and blogs for the 7.7.0
release
x-pack/lib/monitoring/monitoring.rb
Outdated
|
||
def monitoring_endpoint | ||
if LogStash::MonitoringExtension.use_direct_shipping?(LogStash::SETTINGS) | ||
"/_bulk/monitoring" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Testing this code against 7.6.0
generates an error, and glancing at the existing beats code that does this, it appears to send to a bare _bulk
endpoint (rather than _bulk/monitoring
, and send a calculated index name with the request.
x-pack/lib/template.cfg.erb
Outdated
@@ -20,7 +20,7 @@ output { | |||
<% else %> | |||
hosts => <%= es_hosts %> | |||
<% end %> | |||
bulk_path => "/_monitoring/bulk?system_id=logstash&system_api_version=<%= system_api_version %>&interval=1s" | |||
bulk_path => "<%= monitoring_endpoint %>?system_id=logstash&system_api_version=<%= system_api_version %>&interval=1s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the query parameters are going to work
x-pack/lib/template.cfg.erb
Outdated
@@ -20,7 +20,7 @@ output { | |||
<% else %> | |||
hosts => <%= es_hosts %> | |||
<% end %> | |||
bulk_path => "/_monitoring/bulk?system_id=logstash&system_api_version=<%= system_api_version %>&interval=1s" | |||
bulk_path => "<%= monitoring_endpoint %>?system_id=logstash&system_api_version=<%= system_api_version %>&interval=1s" | |||
manage_template => false | |||
document_type => "%{[@metadata][document_type]}" | |||
index => "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we may need to have a value here for the _bulk
endpoint
FWIW, Beats has an integration test that tests collecting+sending monitoring data with
We have a check like this in Beats: https://github.com/elastic/beats/blob/df17e269c0e4db12b3d7b093445108717eade52c/libbeat/monitoring/monitoring.go#L88-L104 |
Thanks for the suggestion @ycombinator I'll create such a test because actually we miss 4 properties in the new document |
Yes, I think you are almost certainly right about this. And this brings up an important point, actually. When we built the
Maybe some of the above is already known but I thought I'd mention it anyway, just to be safe. Internal collection + shipping directly to monitoring cluster should follow the same logic to be consistent with Metricbeat (external) collection + shipping directly to the monitoring cluster. |
I checked through the source code for the
As you can see:
Hope this helps! |
One more thing to note here. The Elasticsearch code does the date formatting (to determine the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer - my main concerns here are around the setting of the cluster uuid
def self.format_cluster_uuid(stats) | ||
result = stats.extract_metrics([:stats, :pipelines, :main, :config], :cluster_uuids) | ||
if result && !result[:cluster_uuids].empty? | ||
result[:cluster_uuids][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about setting precedence here - we might want to have the monitoring.cluster_uuid
override the value from cluster_uuids
, particularly if there are >1 elasticsearch outputs pointing to different clusters.
We might want to log when we detect >1 distinct cluster_uuid
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, explicit user defined values should have precedence on implicitly extracted values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Stack Monitoring UI is centered around the notion of an Elasticsearch cluster. User's first choose an Elasticsearch cluster, then drill deeper into Elasticsearch, Logstash, etc. monitoring data associated with that cluster. So the purpose of the cluster_uuid
field in monitoring documents is to associate the monitoring data in the document with an ES cluster in the Stack Monitoring UI.
In terms of Logstash pipelines with multiple ES outputs, I think it might make sense to show that Logstash pipeline under each of those ES clusters in the UI. The only way to pull this off today is to index the same document multiple times, but with a different cluster_uuid
value each time. We do this in the Metricbeat logstash
module code: https://github.com/elastic/beats/blob/523528b7516adbda814d8770a2e259a4464023b0/metricbeat/module/logstash/node/data_xpack.go#L83-L91.
Happy to go a different direction here but either way we should probably keep the Metricbeat logstash
module implementation and the implementation in this PR consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we could use give precedence to the monitoring.cluster_uuid
settings if defined us that. If not defined, collect the list of cluster_uuid
from all ES output plugins and change the metric input to fire one event for each cluster_uuid
discovered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we are doing it the other way around in the Metricbeat logstash
module. First precedence is given to cluster_uuid(s) discovered from the Logstash pipeline. If there are none, then second precedence is given to the monitoring.cluster_uuid
setting in logstash.yml
. If that is also empty, set the value to an empty string (""
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator Interesting! I agree with the first notion that Logstash data is shown for each of the clusters, good to keep consistency with the Logstash module.
For the second point, does this mean effectively that monitoring.cluster_uuid
is only applicable when no elasticsearch output is defined? Do you log when monitoring.cluster_uuid
is ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second point, does this mean effectively that
monitoring.cluster_uuid
is only applicable when no elasticsearch output is defined?
Yes, that's how we've been treating it in the Metricbeat logstash
module: monitoring.cluster_uuid
is a fallback only in cases when something more specific (an output in a specific pipeline on the node) can't provide the cluster UUID.
Do you log when
monitoring.cluster_uuid
is ignored?
We do not, but perhaps we could, at least at a DEBUG
level. I'm happy to mirror in the Metricbeat logstash
module whatever you decide to do about logging here in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cool. In that case, I think I would prefer that we log that (ideally only once, rather than each time a metric is sent...), and also document the fact that monitoring.cluster_uuid
is only relevant when no elasticsearch output is defined, and will be ignored when at least one es out is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed your suggestions. With commit eebfcd9 I switched the retrieval of cluster_uuid
to be lazy and done only once in the input metric plugin instead of each event generation.
With commit 25beb49 I switched the cluster_uuid to be a list, retrieved by all elasticsearch output plugins, so that the same monitoring event is sent for each cluster_uuid.
@ycombinator I added a log line when we use the cluster_uuids
instead of the setting from config file
What is left out is document better what's the intended use of monitoring.cluster_uuid
setting.
if LogStash::SETTINGS.set?("monitoring.cluster_uuid") | ||
LogStash::SETTINGS.get("monitoring.cluster_uuid") | ||
else | ||
"<UNDEFINED cluster>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this needs to be blank, rather than <UNDEFINED cluster>
- if the cluster_uuid
field is blank, then the stack monitoring page in Kibana will show a page with a list of clusters to select from, including a "standalone" cluster. The monitoring entries for a Logstash instance where the cluster_uuid cannot be found are located on the "standalone" cluster page.
I think we should also log here.
@@ -5,16 +5,30 @@ | |||
module LogStash; module Inputs; class Metrics; | |||
class StateEventFactory | |||
require "logstash/config/lir_serializer" | |||
def initialize(pipeline) | |||
|
|||
def initialize(pipeline, snapshot, cluster_uuid, collection_interval = 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot
parameter no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
update_stats(snapshot) | ||
update_states | ||
update_states(snapshot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot
no longer needed for state updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! The snapshot
parameter can be removed from throughout the update_state
flow, and I have a nit/suggestion on logging.
Jenkins test this please |
Jenkins test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits
@@ -105,15 +106,28 @@ def stop | |||
end | |||
|
|||
def update(snapshot) | |||
if LogStash::MonitoringExtension.use_direct_shipping?(LogStash::SETTINGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case of configuration reloads? Will new cluster_uuids
be picked up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case direct shipping is enabled and a user change the ES output configuration in one of the pipelines those new cluster_uuids
aren't brought up by the monitoring pipeline because the monitoring pipeline is not restarted.
Actually when the agent converge_state_and_update
it reloads the changed pipelines and update_metrics
but doesn't reload also the monitoring pipeline, should we scatter a refresh of that pipeline also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However this behaveior is almost aligned to what it's done for not direct shipping, because in that case the ES to ship data is loaded from logstash.yml
and that ES is responsible to enrich the data with cluster_uuid
, so that configuration to be changed force the user to restart Logstash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with this, and we can always follow up if this becomes an issue.
Its slightly different from the non-direct shipping case, as the logstash.yml
changes would be a change to the location of the monitoring data, whereas the changes drawn from reloaded plugins would be a change to where that data is shown in the monitoring cluster.
if result && !result[:cluster_uuids].empty? | ||
cluster_uuids = result[:cluster_uuids] | ||
@logger.debug("Elasticserach output cluster_uuids discovered: ", :cluster_uuids => cluster_uuids) | ||
@logger.info("Found cluster_uuid from elasticsearch output plugins, ignoring setting monitoring.cluster_uuid") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this info
statement still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's redundant respecting the debug, but probably as a user that configure ES outputs and also monitoring.cluster_uuid
I would like to know why the settings in logstash.yml
is not taken into account. I would keep the info
logging, enriching with the discovered list of cluster_uuids provided by ES output plugins.
x-pack/qa/integration/monitoring/es_documents_structure_validation_spec.rb
Outdated
Show resolved
Hide resolved
Jenkins test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very close! Just another nitpicky comment/suggestion on logging
result = stats.extract_metrics([:stats, :pipelines, :main, :config], :cluster_uuids) | ||
if result && !result[:cluster_uuids].empty? | ||
cluster_uuids = result[:cluster_uuids] | ||
@logger.info("Found cluster_uuid from elasticsearch output plugins, ignoring setting monitoring.cluster_uuid. Using cluster_uuids: ", :cluster_uuids => cluster_uuids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have mixed feelings about mentioning ignoring the cluster_uuid
setting when it has not been set. As long as we document the setting appropriately, the cluster_uuid
setting shouldn't be of concern to most users and we should probably only draw attention to it when it is being used - maybe with an info message if it is being used, and a warning message if it is overridden by the values drawn from the plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, citing something that the user hasn't configured is not good, reworded with a7e9c38
@@ -105,15 +106,28 @@ def stop | |||
end | |||
|
|||
def update(snapshot) | |||
if LogStash::MonitoringExtension.use_direct_shipping?(LogStash::SETTINGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with this, and we can always follow up if this becomes an issue.
Its slightly different from the non-direct shipping case, as the logstash.yml
changes would be a change to the location of the monitoring data, whereas the changes drawn from reloaded plugins would be a change to where that data is shown in the monitoring cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
64d33f9
to
3ae0fc4
Compare
… cluster Close 11573
3ae0fc4
to
61d1634
Compare
Andrea Selva merged this into the following branches!
|
… cluster Close 11573 Fixes elastic#11541
… cluster Close 11573 Added check on HTTP server before asking for monitoring data in unit test Fixes elastic#11541
…S monitoring cluster During the development of PR elastic#11541 to direct ship monitoring data to an monitoring ES cluster without hopping through a production ES cluster, the settings for elasticsearch ouput was cloned into a version without the `xpack` prefix. Since that feature has been removed the settings should also be removed from the Docker image
…S monitoring cluster During the development of PR #11541 to direct ship monitoring data to an monitoring ES cluster without hopping through a production ES cluster, the settings for elasticsearch ouput was cloned into a version without the `xpack` prefix. Since that feature has been removed the settings should also be removed from the Docker image
…S monitoring cluster During the development of PR #11541 to direct ship monitoring data to an monitoring ES cluster without hopping through a production ES cluster, the settings for elasticsearch ouput was cloned into a version without the `xpack` prefix. Since that feature has been removed the settings should also be removed from the Docker image
This PR is intended to implements the actions of issue #11573, collect the changes requested to internal collector to adapt to version 7.7 of ES monitoring documents.
The key point of the changes:
xpack.monitoring.*
and exposesmonitoring.*
with same extensions.xpack.monitoring.*
setting andmonitoring.*
it make Logstash to exitcluster_uuid
fromoutput-elasticsearch
defined in any pipeline, sending monitoring information for each of the cluster_uuid. if not defined it brings from settingmonitoring.cluster_uuid
, if also that is not defined put a default string empty string.