[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

reta · 2023-09-20T13:18:28Z

Describe the bug
The test case org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky:

org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}}


java.lang.AssertionError: 
Expected: a value less than <2>
     but: <2> was equal to <2>
	at __randomizedtesting.SeedInfo.seed([FFEE46EA8302D5D5:6EBDE6AAF55B33A6]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions(SearchScrollWithFailingNodesIT.java:125)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce

/gradlew ':server:internalClusterTest' --tests "org.opensearch.search.scroll.SearchScrollWithFailingNodesIT" -Dtests.method="testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=FFEE46EA8302D5D5

Expected behavior
The test should always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

CI

Additional context

The text was updated successfully, but these errors were encountered:

reta · 2023-09-20T13:18:47Z

CC @sohami @neetikasinghal fyi :(

ashking94 · 2023-09-21T07:10:20Z

Seeing failures for search.concurrent_segment_search.enabled = false as well.
org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"false"}} -> https://build.ci.opensearch.org/job/gradle-check/25926/testReport/junit/org.opensearch.search.scroll/SearchScrollWithFailingNodesIT/testScanScrollWithShardExceptions__p0___search_concurrent_segment_search_enabled___false___/

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves opensearch-project#10137 Signed-off-by: Andrew Ross <andrross@amazon.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves #10137 Signed-off-by: Andrew Ross <andrross@amazon.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves #10137 Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit 562e3b2) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves #10137 (cherry picked from commit 562e3b2) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves opensearch-project#10137 Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves opensearch-project#10137 Signed-off-by: Andrew Ross <andrross@amazon.com>

The test intended to stop a data node and called a method named `stopRandomNonClusterManagerNode()` in order to do that. However, that method would stop a random node that was not the currently elected cluster manager, regardless of node role. I have also renamed that method hoping to be more clear. Resolves opensearch-project#10137 Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>

reta added bug Something isn't working flaky-test Random test failure that succeeds on second run labels Sep 20, 2023

github-actions bot added the untriaged label Sep 20, 2023

reta removed the untriaged label Sep 20, 2023

neetikasinghal added this to Concurrent Search Sep 20, 2023

github-project-automation bot moved this to Todo in Concurrent Search Sep 20, 2023

ashking94 mentioned this issue Sep 21, 2023

Downgrade write lock to read lock before translog upload to remote store #10135

Merged

6 tasks

Poojita-Raj mentioned this issue Sep 22, 2023

[AUTOCUT] Gradle Check Failure on push to main #10175

Closed

kotwanikunal mentioned this issue Sep 22, 2023

Refactor async blob read to avoid blocking calls, support non multipa… #10192

Merged

6 tasks

Poojita-Raj mentioned this issue Sep 25, 2023

Add workflow to add stalled label and message #10197

Merged

6 tasks

r1walz mentioned this issue Sep 27, 2023

Indexing: add Doc status counter #8716

Merged

6 tasks

jed326 mentioned this issue Oct 2, 2023

Disable concurrent search for terminate_after path #10200

Merged

6 tasks

This was referenced Oct 2, 2023

[AUTOCUT] Gradle Check Failure on push to 2.x #10323

Closed

[AUTOCUT] Gradle Check Failure on push to 2.x #10343

Closed

andrross mentioned this issue Oct 4, 2023

Fix flaky SearchScrollWithFailingNodesIT #10374

Merged

7 tasks

This was referenced Oct 7, 2023

[AUTOCUT] Gradle Check Failure on push to main #10499

Closed

[AUTOCUT] Gradle Check Failure on push to 2.11 #10500

Closed

andrross closed this as completed in #10374 Oct 10, 2023

github-project-automation bot moved this from Todo to Done in Concurrent Search Oct 10, 2023

kotwanikunal mentioned this issue Dec 4, 2023

[AUTO] Increment version to 2.11.2. #11422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

reta commented Sep 20, 2023

reta commented Sep 20, 2023

ashking94 commented Sep 21, 2023

[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

Comments

reta commented Sep 20, 2023

reta commented Sep 20, 2023

ashking94 commented Sep 21, 2023