Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OTel agent detector test that was polluting the JVM for later tests #8449

Merged
merged 3 commits into from
Mar 4, 2024

Conversation

tjquinno
Copy link
Member

@tjquinno tjquinno commented Mar 2, 2024

Description

Resolves #8448

Some pipeline runs and some developers' local builds experienced reliable failures, other pipeline runs and developers saw no problems.

There had been some conjecture that some static storage was contributing to this problem.

Instead, the problem turned out to be a bug in the AgentDetectorTest class.

Background

Our AgentDetector examines config and system properties (and the OTel context) to try to determine if the OTel agent is present. Our OpenTelemetryProducer uses the AgentDetector to decide whether to use the OpenTelemetry instance that the agent will already have set (if the agent is present) or to initialize and set an OpenTelemetry instance according to current config settings (if the agent is absent).

Some of our tests use an in-memory TestSpanExporter to see if spans were managed as expected. The use of this test exporter is conveyed through configuration that is set at test-time and so will not be used if the agent is detected.

The problem

The AgentDetectorTest set a system property to make sure that the detector correctly responded, but the test did not reset or clear the system property after the test.

If the agent test ran before other tests that depended on our custom in-memory collector, the later tests would fail because tests would skip using the configuration to prepare OTel and, therefore, the in-memory collector. Because span data would not be sent to the in-memory collector in those cases, tests that expected span data in order to check it would fail. The non-deterministic ordering of test execution explains why some builds saw failures while others did not.

The fix

The PR changes the AgentDetectorTest to restore or clear the system property at the end of the test to avoid disrupting later tests.

The PR also removes the changes to the surefire plug-in config that forces tests into their own JVMs since that workaround is no longer needed.

I verified the problem and the fix by temporarily forcing the order of the tests so the agent detector test preceded at least one of the tests that failed.

Documentation

No impact.

Signed-off-by: Tim Quinn <tim.quinn@oracle.com>
@tjquinno tjquinno self-assigned this Mar 2, 2024
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 2, 2024
@tjquinno tjquinno requested review from ljnelson and barchetta March 3, 2024 13:38
<id>default-test</id>
<configuration>
<!-- The default for the following is 5000 ms. Reducing it significantly speeds up tests. -->
<argLine>-Dotel.bsp.schedule.delay=100</argLine>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider, rather:

<systemPropertyVariables>
  <otel.bsp.schedule.delay>${otel.bsp.schedule.delay}</otel.bsp.schedule.delay>
</systemPropertyVariables>

…and, elsewhere in the pom.xml:

<properties>
  <otel.bsp.schedule.delay>100</otel.bsp.schedule.delay>
</properties>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@tjquinno tjquinno requested a review from ljnelson March 4, 2024 21:41
@tjquinno tjquinno merged commit 611ba50 into helidon-io:main Mar 4, 2024
12 checks passed
@tjquinno tjquinno deleted the 4.x-fix-otel-agent-test branch March 4, 2024 22:34
hrstoyanov pushed a commit to hrstoyanov/helidon that referenced this pull request Mar 12, 2024
…ts (helidon-io#8449)

* Fix OTel agent detector test that was polluting the JVM for later tests

Signed-off-by: Tim Quinn <tim.quinn@oracle.com>

* Remove a stray blank line in the pom

* Review comments

---------

Signed-off-by: Tim Quinn <tim.quinn@oracle.com>
hrstoyanov pushed a commit to hrstoyanov/helidon that referenced this pull request Mar 12, 2024
…ts (helidon-io#8449)

* Fix OTel agent detector test that was polluting the JVM for later tests

Signed-off-by: Tim Quinn <tim.quinn@oracle.com>

* Remove a stray blank line in the pom

* Review comments

---------

Signed-off-by: Tim Quinn <tim.quinn@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
2 participants