-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metricbeat] - Allow metricsets to report their status via v2 protocol #40025
[metricbeat] - Allow metricsets to report their status via v2 protocol #40025
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
6c10a9f
to
e37ef86
Compare
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I don't see any actual invocations of SetStatusReporter
, which seems to be required for the other changes to have an effect. Am I missing something, or is that coming in a later PR?
@Fae, This part was introduced by #39209 and it's done at Lines 157 to 161 in 1a6318d
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, looks good to me!
@cmacknz Let me know how it looks to you now and if you have any more comments regarding this. @faec Thanks for reviewing! |
The changes themselves LGTM, but I am not getting the result I expected when I test this manually. What I did:
~$ cat script.sh
while true;
do
echo "Hello!";
sleep 1;
done
~$ sudo ./script.sh
{"log.level":"debug","@timestamp":"2024-07-16T21:04:30.148Z","message":"Non-fatal error fetching PID metrics for 30358, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30358/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:04:40.143Z","message":"Non-fatal error fetching PID metrics for 30357, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30357/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:05:50.143Z","message":"Non-fatal error fetching PID metrics for 30358, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30358/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:05:50.143Z","message":"Error fetching PID info for 30642, skipping: GetInfoForPid: no such process","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes","log.origin":{"file.line":198,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidIter"},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.055Z","message":"Non-fatal error fetching PID metrics for 30356, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30356/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30357, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30357/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30358, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30358/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"ecs.version":"1.6.0","log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30649, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30649/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30650, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30650/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30651, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30651/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30652, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30652/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"processes","log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-07-16T21:06:10.056Z","message":"Non-fatal error fetching PID metrics for 30670, metrics are valid, but partial: Not enough privileges to fetch information: /io unavailable; if running inside a container, use SYS_PTRACE: error fetching IO metrics: open /proc/30670/io: permission denied","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.origin":{"file.line":268,"file.name":"process/process.go","function":"github.com/elastic/elastic-agent-system-metrics/metric/system/process.(*Stats).pidFill"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"processes","ecs.version":"1.6.0"} I've attached diagnostics from the VM. The logs indicating the error is non-fatal might be what is catching us here, I think this has changed since the issue to implement this was created. Perhaps you were initially right hooking into the error in the event, which might catch these since they are non-fatal, but I still think looking at events makes it harder to see when we will go back to healthy. Regardless, this isn't actually catching the degraded state we want. unpriv-metricbeat-diagnostics.zip I also wonder if we could get a test to catch this, assuming I verified this correctly. |
@cmacknz I see. I think this is due to the fact that We can make use of |
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request is now in conflicts. Could you fix it? 🙏
|
210c929
to
abcb26a
Compare
@@ -253,14 +255,20 @@ func (msw *metricSetWrapper) fetch(ctx context.Context, reporter reporter) { | |||
err := fetcher.Fetch(reporter.V2()) | |||
if err != nil { | |||
reporter.V2().Error(err) | |||
msw.module.UpdateStatus(status.Degraded, fmt.Sprintf("Error fetching data for metricset %s.%s: %s", msw.module.Name(), msw.MetricSet.Name(), err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use %v for the err
} | ||
case mb.ReportingMetricSetV2WithContext: | ||
reporter.StartFetchTimer() | ||
err := fetcher.Fetch(ctx, reporter.V2()) | ||
if err != nil { | ||
reporter.V2().Error(err) | ||
msw.module.UpdateStatus(status.Degraded, fmt.Sprintf("Error fetching data for metricset %s.%s: %s", msw.module.Name(), msw.MetricSet.Name(), err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use %v for the err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pkoutsovasilis I didn't see this.
I'll raise another PR for this.
Sorry from my end!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits, but looks good to me
@cmacknz Screenshot of error event on Kibana: |
Merging this as of now. |
#40025) * fix: initial commit * tests: add integration test cases * fix: expand testing scenarios * fix: add comments * fix: move integration tests to system/process * cleanup * fix: ci * fix: ci and typos * chore: update changelog * fix: add helper * fix: remove extra space * fix: ci * fix: move integration tests to x-pack * fix: add null check * fix: ci * fix: remove unused code * fix: lint * fix: lint and imports * fix: ci windows * inting for windows * fix lint linux * fix: go imports * fix: switch to the generic way * chore: make error descriptive * fix: move status report after fetch * fix: typo * fix: remove nolint --------- Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co> (cherry picked from commit c86330a)
|
Proposed commit message
Add
StatusReporter
for letting individual metricsets to report their statuses.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues
Use cases
Screenshots
Logs