-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Test SegmentReplicationUsingRemoteStoreDisruptionIT.testCancelReplicationWhileFetchingMetadata is flaky #10902
Comments
I was able to reproduce this issue after 2.5K iterations. I have identified the fix and raised the PR - #10985. It appears to be a very corner case where during test cleanup, index deletion and activate primary mode are happening simultaneously in a such a way that the translog writer gets rotated while closing the current writer due to translog being open, but gets closed at the time of upload. This is race condition which is very rare and non-problematic in prod mode. |
Ran with the fix for 16K iterations and there have been no failures yet. |
@ashking94 are you running all 16k iterations on the same test seed or on different test seeds? |
This is with different seed and with the prospective fix. I have not been able to get reviews on #10985 which fixes this issue. |
|
Taking a look |
Ran this for 1000 iterations and didn't observe any failures. From the first comment after this issue was reopened, it seems like the exception is related to an
Checking the code for We have a similar implementation under OpenSearch/server/src/main/java/org/opensearch/threadpool/ThreadPool.java Lines 472 to 490 in 9e62ccf
We should use this method instead of the usual OpenSearch/server/src/main/java/org/opensearch/index/shard/ReleasableRetryableRefreshListener.java Lines 125 to 132 in 9e62ccf
|
Closing since fixes have been made through #13301 |
Describe the bug
This test is flaky - It doesn't look like its failing for any download/replication step, rather it is failing post failover with remote translog logic.
Seed:
Expected behavior
No flaky
The text was updated successfully, but these errors were encountered: