Skip to content

Conversation

@guojialiang92
Copy link
Contributor

@guojialiang92 guojialiang92 commented Aug 12, 2025

Description

I reproduced this test locally. The exception information is as follows:

ClusterManagerNotDiscoveredException[cluster-manager not discovered
]
	at __randomizedtesting.SeedInfo.seed([CA32ECE92650D7B6:4E02F48BD9CA9FCF]:0)
	at org.opensearch.action.admin.cluster.health.TransportClusterHealthAction.executeHealth(TransportClusterHealthAction.java:284)
	at org.opensearch.action.admin.cluster.health.TransportClusterHealthAction.clusterManagerOperation(TransportClusterHealthAction.java:158)
	at org.opensearch.action.admin.cluster.health.TransportClusterHealthAction.clusterManagerOperation(TransportClusterHealthAction.java:81)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.lambda$doStart$0(TransportClusterManagerNodeAction.java:262)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.doStart(TransportClusterManagerNodeAction.java:259)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.tryAction(TransportClusterManagerNodeAction.java:226)
	at org.opensearch.action.support.RetryableAction$1.doRun(RetryableAction.java:139)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341)
	at org.opensearch.action.support.RetryableAction.run(RetryableAction.java:117)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:187)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:92)
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:220)
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:190)
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:109)
	at org.opensearch.transport.client.node.NodeClient.executeLocally(NodeClient.java:113)
	at org.opensearch.transport.client.node.NodeClient.doExecute(NodeClient.java:100)
	at org.opensearch.transport.client.support.AbstractClient.execute(AbstractClient.java:501)
	at org.opensearch.transport.client.support.AbstractClient.execute(AbstractClient.java:488)
	at org.opensearch.transport.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:829)
	at org.opensearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:66)
	at org.opensearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:73)
	at org.opensearch.cluster.routing.WeightedRoutingIT.testClusterHealthResponseWithEnsureNodeWeighedInParam(WeightedRoutingIT.java:722)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Reproduce

Running the following command will result in approximately 6 failures out of every 100 times.

./gradlew 'null' --tests 'org.opensearch.cluster.routing.WeightedRoutingIT.testClusterHealthResponseWithEnsureNodeWeighedInParam [seed=[CA32ECE92650D7B6:4E02F48BD9CA9FCF]]' -Dtests.seed=CA32ECE92650D7B6 -Dtests.locale=ses-Latn-ML -Dtests.timezone=Africa/Maputo -Druntime.java=21

Analysis

This is because it is necessary to wait for a period of time after performing NetworkDisruption#startDisrupting and NetworkDisruption#stopDisrupting before they take effect.
Using Thread.sleep does not guarantee the effectiveness of the state.

Solve

Use assertBusy instead of Thread.sleep.
After the fix, running locally 100 times did not encounter any issues again, and the average test runtime decreased from 13s to 3s.

Related Issues

Resolves #19028

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
@guojialiang92 guojialiang92 requested a review from a team as a code owner August 12, 2025 14:51
@github-actions
Copy link
Contributor

✅ Gradle check result for 141ae52: SUCCESS

@codecov
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.87%. Comparing base (f967a72) to head (141ae52).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19032      +/-   ##
============================================
- Coverage     72.96%   72.87%   -0.10%     
+ Complexity    69451    69340     -111     
============================================
  Files          5645     5645              
  Lines        318787   318787              
  Branches      46125    46125              
============================================
- Hits         232610   232314     -296     
- Misses        67358    67697     +339     
+ Partials      18819    18776      -43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Aug 12, 2025
@andrross andrross merged commit 2f416d3 into opensearch-project:main Aug 12, 2025
37 of 38 checks passed
RajatGupta02 pushed a commit to RajatGupta02/OpenSearch that referenced this pull request Aug 18, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
atris pushed a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
Signed-off-by: guojialiang <guojialiang.2012@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for WeightedRoutingIT

2 participants