Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky #10137

Closed
reta opened this issue Sep 20, 2023 · 2 comments · Fixed by #10374
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@reta
Copy link
Collaborator

reta commented Sep 20, 2023

Describe the bug
The test case org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky:

org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}}


java.lang.AssertionError: 
Expected: a value less than <2>
     but: <2> was equal to <2>
	at __randomizedtesting.SeedInfo.seed([FFEE46EA8302D5D5:6EBDE6AAF55B33A6]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions(SearchScrollWithFailingNodesIT.java:125)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)


To Reproduce

/gradlew ':server:internalClusterTest' --tests "org.opensearch.search.scroll.SearchScrollWithFailingNodesIT" -Dtests.method="testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=FFEE46EA8302D5D5 

Expected behavior
The test should always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context

@reta reta added bug Something isn't working flaky-test Random test failure that succeeds on second run labels Sep 20, 2023
@reta
Copy link
Collaborator Author

reta commented Sep 20, 2023

CC @sohami @neetikasinghal fyi :(

@ashking94
Copy link
Member

Seeing failures for search.concurrent_segment_search.enabled = false as well.
org.opensearch.search.scroll.SearchScrollWithFailingNodesIT.testScanScrollWithShardExceptions {p0={"search.concurrent_segment_search.enabled":"false"}} -> https://build.ci.opensearch.org/job/gradle-check/25926/testReport/junit/org.opensearch.search.scroll/SearchScrollWithFailingNodesIT/testScanScrollWithShardExceptions__p0___search_concurrent_segment_search_enabled___false___/

andrross added a commit to andrross/OpenSearch that referenced this issue Oct 4, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves opensearch-project#10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this issue Oct 10, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves opensearch-project#10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit that referenced this issue Oct 10, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves #10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-project-automation github-project-automation bot moved this from Todo to Done in Concurrent Search Oct 10, 2023
opensearch-trigger-bot bot pushed a commit that referenced this issue Oct 10, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves #10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 562e3b2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this issue Oct 10, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves #10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 562e3b2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dblock pushed a commit that referenced this issue Oct 10, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves #10137


(cherry picked from commit 562e3b2)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this issue Oct 19, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves opensearch-project#10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this issue Oct 23, 2023
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves opensearch-project#10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this issue Apr 25, 2024
The test intended to stop a data node and called a method named
`stopRandomNonClusterManagerNode()` in order to do that. However, that
method would stop a random node that was not the currently elected
cluster manager, regardless of node role. I have also renamed that
method hoping to be more clear.

Resolves opensearch-project#10137

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants