Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation flaky test failure #10025

Closed
dreamer-89 opened this issue Sep 13, 2023 · 0 comments · Fixed by #10655
Closed
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage untriaged

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Sep 13, 2023

Coming from #8279 (comment), SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation is flaky.

Gradle report: https://build.ci.opensearch.org/job/gradle-check/25111/testReport/
Build with test failures: (24228,24358,24612,24686,25111)

Assertion trip

java.lang.AssertionError: Expected search hits on node: node_t3 to be at least 1 but was: 0
	at __randomizedtesting.SeedInfo.seed([8477D22ECF559973:53F3D3D66092887]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.lambda$waitForSearchableDocs$0(SegmentReplicationBaseIT.java:124)
	at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1086)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:119)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:114)
	at org.opensearch.indices.replication.SegmentReplicationBaseIT.waitForSearchableDocs(SegmentReplicationBaseIT.java:131)

Gradle run logs shows AlreadyClosedException exception on engine refresh.

[2023-09-08T17:43:03,934][ERROR][o.o.i.s.RemoteStoreRefreshListener] [node_t2] [test-idx-1][0] Exception in runAfterRefreshExactlyOnce() method
org.apache.lucene.store.AlreadyClosedException: engine is closed
	at org.opensearch.index.shard.IndexShard.getEngine(IndexShard.java:3452) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:4944) ~[main/:?]
	at org.opensearch.index.shard.RemoteStoreRefreshListener.runAfterRefreshExactlyOnce(RemoteStoreRefreshListener.java:133) [main/:?]
	at org.opensearch.index.shard.CloseableRetryableRefreshListener.afterRefresh(CloseableRetryableRefreshListener.java:62) [main/:?]
	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) [lucene-core-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1769) [main/:?]
	at org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1884) [main/:?]
	at org.opensearch.index.engine.Engine.flush(Engine.java:1198) [main/:?]
	at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1973) [main/:?]
	at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1938) [main/:?]
	at org.opensearch.index.IndexService.closeShard(IndexService.java:630) [main/:?]
	at org.opensearch.index.IndexService.removeShard(IndexService.java:606) [main/:?]
	at org.opensearch.index.IndexService.close(IndexService.java:380) [main/:?]
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1019) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:442) [main/:?]
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:283) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484) [main/:?]
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.lang.Thread.run(Thread.java:1623) [?:?]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage untriaged
Projects
None yet
1 participant