Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testBasicTaskResourceTracking is flaky #8213

Closed
ashking94 opened this issue Jun 22, 2023 · 3 comments · Fixed by #8993
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@ashking94
Copy link
Member

ashking94 commented Jun 22, 2023

See https://build.ci.opensearch.org/job/gradle-check/18108/

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testBasicTaskResourceTracking" -Dtests.seed=3A623BD634D99A0D -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-KW -Dtests.timezone=Europe/Paris -Druntime.java=20

Error showing up -

يون 21, 2023 9:25:43 م com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[#125,opensearch[TransportTasksActionTests][search][T#1],5,TGRP-ResourceAwareTasksTests]
java.lang.AssertionError: 
Expected: a value less than or equal to <8200000L>
     but: <8280016L> was greater than <8200000L>
	at __randomizedtesting.SeedInfo.seed([3A623BD634D99A0D]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.assertMemoryUsageWithinLimits(ResourceAwareTasksTests.java:677)
	at org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.lambda$testBasicTaskResourceTracking$1(ResourceAwareTasksTests.java:314)
	at org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests$ResourceAwareNodesAction$1$1.innerOnResponse(ResourceAwareTasksTests.java:194)
	at org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests$ResourceAwareNodesAction$1$1.innerOnResponse(ResourceAwareTasksTests.java:191)
	at org.opensearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:55)
	at org.opensearch.tasks.Task.lambda$decrementResourceTrackingThreads$1(Task.java:564)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.opensearch.tasks.Task.decrementResourceTrackingThreads(Task.java:562)
	at org.opensearch.tasks.Task.stopThreadResourceTracking(Task.java:468)
	at org.opensearch.tasks.TaskResourceTrackingService.taskExecutionFinishedOnThread(TaskResourceTrackingService.java:198)
	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:81)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@ashking94 ashking94 added bug Something isn't working untriaged labels Jun 22, 2023
@ashking94
Copy link
Member Author

@ketanv3 fyi ^

@stephen-crawford
Copy link
Contributor

I have not been able to reproduce locally either with the individual test or as a test class:

Screenshot 2023-06-23 at 10 58 14 AM Screenshot 2023-06-23 at 10 58 57 AM

If someone else reproduces please leave an update. Otherwise we may want to close this issue.

@ketanv3
Copy link
Contributor

ketanv3 commented Jun 26, 2023

The assertMemoryUsageWithinLimits checks if the memory usage is within 5% (up to 200 KiB) of the expected memory usage. This additional buffer is to account for the class loading overhead when new code is encountered at runtime.

The flaky execution had ~280 KiB more memory usage than the expected, i.e., the buffer wasn't sufficient. This is possibly due to introduction of new code paths, tests, or change in JVM which resulted in extra overhead. Few possible options:

  1. Increase the 200 KiB cap to, say, 500 KiB.
  2. Remove the cap altogether.
  3. Perform a warmup run before taking actual measurements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants