Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotestore.RemoteStoreForceMergeIT.testRestoreForceMergeSingleIteration flaky failure #9294

Closed
gbbafna opened this issue Aug 14, 2023 · 2 comments
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage Issues and PRs relating to data and metadata storage

Comments

@gbbafna
Copy link
Collaborator

gbbafna commented Aug 14, 2023

Describe the bug

The assertions in delete stale translog data is tripping . This is not a reproducible failure.

https://build.ci.opensearch.org/job/gradle-check/22373/testReport/junit/org.opensearch.remotestore/RemoteStoreForceMergeIT/testRestoreForceMergeSingleIteration/

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2679, name=opensearch[node_s1][flush][T#1], state=RUNNABLE, group=TGRP-RemoteStoreForceMergeIT]
	at __randomizedtesting.SeedInfo.seed([90BCF02533C9A76B:61FEE78AB3B48CC0]:0)
Caused by: java.lang.AssertionError: [remote-store-test-idx-1][0] Expected non-empty readers
	at __randomizedtesting.SeedInfo.seed([90BCF02533C9A76B]:0)
	at org.opensearch.index.translog.RemoteFsTranslog.deleteStaleRemotePrimaryTerms(RemoteFsTranslog.java:430)
	at org.opensearch.index.translog.RemoteFsTranslog.trimUnreferencedReaders(RemoteFsTranslog.java:400)
	at org.opensearch.index.translog.InternalTranslogManager.trimUnreferencedReaders(InternalTranslogManager.java:394)
	at org.opensearch.index.engine.InternalEngine.revisitIndexDeletionPolicyOnTranslogSynced(InternalEngine.java:537)
	at org.opensearch.index.engine.InternalEngine$1.onAfterTranslogSync(InternalEngine.java:265)
	at org.opensearch.index.translog.listener.CompositeTranslogEventListener.onAfterTranslogSync(CompositeTranslogEventListener.java:49)
	at org.opensearch.index.translog.InternalTranslogManager.syncTranslog(InternalTranslogManager.java:193)
	at org.opensearch.index.shard.IndexShard.sync(IndexShard.java:4154)
	at org.opensearch.index.IndexService.maybeFSyncTranslogs(IndexService.java:980)
	at org.opensearch.index.IndexService$AsyncTranslogFSync.runInternal(IndexService.java:1104)
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)


@gbbafna gbbafna added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Aug 14, 2023
@gbbafna gbbafna self-assigned this Aug 14, 2023
@gbbafna gbbafna added Storage Issues and PRs relating to data and metadata storage and removed distributed framework untriaged labels Aug 14, 2023
@sachinpkale sachinpkale assigned sachinpkale and unassigned gbbafna Aug 23, 2023
@sachinpkale
Copy link
Member

Looking into it.

@sachinpkale
Copy link
Member

This should be fixed with #9458

Running the test in local for ~200 times to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage Issues and PRs relating to data and metadata storage
Projects
None yet
Development

No branches or pull requests

3 participants