-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication is flaky #8059
Comments
@ankitkala |
I triggered more than 100 runs locally for this test and i was not able to reproduce this again. Let's revisit this again if we see another such occurrence. I also went through the gradle check logs shared above couldn't find anything useful. |
One more failure: https://build.ci.opensearch.org/job/gradle-check/18866/ Can we please mute these tests? |
I've added a minor change to hopefully avoid the flakyness: #8431 If the failure still persists after the change, will mute the test till we fix it. |
Another flaky failure here: #8667 (comment) |
This test is failing because this maybeRefresh call is failing to acquire a refresh lock and returning without updating the reader reference with the updated SegmentInfos. The refresh that holds the lock is the refresh triggered via API in the test here. The primary must be initiating copy internally on a scheduled refresh before the refresh is triggered that starts the replication cycle. When the request to refresh hits the replica it is forcing a refresh on NRTReplicationEngine directly here that acquires the lock and so that the maybeRefresh during segment update does nothing. We are left in a state where updateSegments completes and the reader has the updated segment ref, but has not internally refreshed. There are a couple of things I think we should do here.
|
PR'd to fix the immediate problem of not acquiring the refresh lock, will make a separate change to skip registering listeners. |
This test |
Doc count mis-match assertion failure.
|
Resolved with #9471. Please re-open if this test pops up again. |
Describe the bug
org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication
is flakyhttps://build.ci.opensearch.org/job/gradle-check/17543/testReport/junit/org.opensearch.remotestore/SegmentReplicationUsingRemoteStoreIT/testDropPrimaryDuringReplication/
#8057 (comment)
Assertion Failure
To Reproduce
The text was updated successfully, but these errors were encountered: