Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run apm-server standalone as part of e2e tests #3728

Merged
merged 26 commits into from
Aug 2, 2024

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Jul 15, 2024

What is the problem this PR solves?

We have had issues on cloud deployments where APM instrumentation is not properly propagated to fleet-server.
See elastic/elastic-agent#5204; this PR does not fix the underlying behaviour but has added tests to ensure APM config is effective in some e2e test cases.

How does this PR solve the problem?

Adds an integration test to ensure that server/agent.go propagates config from proto.APMConfig correctly.
Adds the apm-server as a stand-alone binary in our e2e test suite, and specifies tests so we can test if fleet-server properly loads and sends trace information.

How to test this PR locally

make test-int runs the integration test that ensures proto.APMConfig attributes are properly passed by the elastic-agent-client interactions when the fleet-server is in agent-mode
make test-e2e ensures traces are present in the ES cluster if the agent is started in stand-alone mode with server.instrumentation.* attributes set, or if the agent.monitoring.apm.* attributes are set in the policy when running under an agent.

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas~~
  • ~~I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jul 15, 2024
@michel-laterman
Copy link
Contributor Author

michel-laterman commented Jul 18, 2024

I've noticed 2 things while implementing these tests

  1. I don't think the fleet-server is applying GlobalLabels correctly; i've adjusted the environment attribute of the tests to be test specific in order to try to get around this. I don't think we need to fix it for the e2e tests to pass, but it should definitly be part of the backlog if it's indeed the case
  2. I think there's a timing issue with the calls to HasTestStatusTrace in the tests; On my local machine the initial attempt to run a single (apm) test will fail; but I can see that a trace is in the ES instance, subsequent runs will succeed.

resolved issues

@michel-laterman michel-laterman marked this pull request as ready for review July 24, 2024 18:42
@michel-laterman michel-laterman requested a review from a team as a code owner July 24, 2024 18:42
@michel-laterman michel-laterman requested review from faec and pchila July 24, 2024 18:42
Comment on lines 329 to 331
// testAPMInstrumentationFile tests passing agentt.monitoring.apm.* config options through the elastic-agent.yml file during install time
// it currently does not work, not sure if that is intended behaviour
func (suite *AgentInstallSuite) testAPMInstrumentationFile() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse, If I replace the elastic-agent.yml file with the contents templated from testdata/agent-install-apm.tpl do we expect apm traces to be enabled?

I think that is what is being reported in https://github.com/elastic/fleet-server/issues/3526

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I am reading in the Elastic Agent code it should be present - https://github.com/elastic/elastic-agent/blob/main/internal/pkg/agent/application/application.go#L246

I do wonder what is being passed into the policy by Fleet. It is possible that Fleet is just overriding all the values from the local configuration. Fleet policy always overrides local policy so thats very possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test policy that i'm using has:

agent:
  monitoring:
    enabled: false
    logs: false
    metrics: false

I can try to retest but chagne the elastic-agent.yml to not have agent.monitoring as a top level key

Copy link

@ycombinator ycombinator removed the request for review from pchila July 30, 2024 17:12
@ycombinator
Copy link
Contributor

@michel-laterman @blakerouse what's the next step to move this PR forward?

@michel-laterman
Copy link
Contributor Author

I'm just waiting on a review; the TestAPMInstrumentationFile tests can be unskipped once elastic/elastic-agent#5204 is resolved

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Couldn't find anything that stands out.

@michel-laterman michel-laterman merged commit 6cf3be6 into elastic:main Aug 2, 2024
8 checks passed
@michel-laterman michel-laterman deleted the e2e-apm-tests branch August 2, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Testing] Add test coverage for configuration loading
3 participants