Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable monitoring buffer for elastic-agent #30471

Merged
merged 12 commits into from
Mar 4, 2022

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Feb 18, 2022

What does this PR do?

Enable the monitoring buffer for the elastic-agent and beats that it starts.
Add metrics collection (from buffers) into the diagnostics collect bundle if enabled.

Why is it important?

Allows us to gather metrics data for debugging if agent->ES communications are failing.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation PR
  • I have made corresponding change to the default configuration files
  • [] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Start agent and run elastic-agent diagnostics collect. Metrics should not be included in the archive by default.
Enable metrics buffers by adding agent.monitoring.http.buffer.enabled: true then rerun elastic-agent diagnostics collect, metrics (json) files should be added to the archive

Related issues

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 18, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 18, 2022

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 18, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-04T00:50:51.647+0000

  • Duration: 16 min 2 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Mar 1, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b agent-metrics-buffer upstream/agent-metrics-buffer
git merge upstream/main
git push upstream agent-metrics-buffer

Comment on lines 123 to 127
# buffer:
# enabled: false
# period: 10s
# size: 60
# namespaces: ['stats']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the full buffer config here, but I'm not sure if we should. Currently the enabled flag is the only one that's passed (it's injected via command line args at the moment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the other values would be noop, if this is the case we should not expose them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree if its not exposed then it should be hidden.

@michel-laterman michel-laterman marked this pull request as ready for review March 1, 2022 01:17
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@michel-laterman
Copy link
Contributor Author

/package

Copy link
Contributor

@lykkin lykkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some questions for context:

beat metrics are collected in the diag command over grpc by way of an http endpoint. does that sound right?

are there any docs i can read up on the metrics libbeat exports this way?

x-pack/elastic-agent/CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Bryan Clement <bclement01@gmail.com>
Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michel-laterman Can we add a few tests to ensure the behavior?

Comment on lines 123 to 127
# buffer:
# enabled: false
# period: 10s
# size: 60
# namespaces: ['stats']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the other values would be noop, if this is the case we should not expose them.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, minus the one comment.

Comment on lines 123 to 127
# buffer:
# enabled: false
# period: 10s
# size: 60
# namespaces: ['stats']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree if its not exposed then it should be hidden.

@michel-laterman
Copy link
Contributor Author

@ph, as we discussed I think testing for the diagnostics commands should be added to e2e tests, i've made an issue to track this here: elastic/e2e-testing#2204

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlind23
Copy link
Collaborator

jlind23 commented Mar 3, 2022

@michel-laterman will you be able to merge it before your time off?

@michel-laterman
Copy link
Contributor Author

yep, as soon as the e2e passes i'll merge

@michel-laterman
Copy link
Contributor Author

/test

@michel-laterman
Copy link
Contributor Author

I'm going to force this through, all checks succeeded in c8339ad and I have only changed default config to remove unused values.

@michel-laterman michel-laterman merged commit 0099f5c into elastic:main Mar 4, 2022
@michel-laterman michel-laterman deleted the agent-metrics-buffer branch March 4, 2022 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants