-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky test SegmentReplicationTargetServiceTests#testShardAlreadyReplicating #13248
Conversation
…eplicating This test is flaky because it is incorrectly passing a checkpoint with a higher primary term on the second invocation. This will cancel the first replication and start another. The test sometimes passes because it is only asserting on processLatestReceivedCheckpoint. If the cancellation quickly completes before attempting second replication event the test will fail, otherwise it will pass. Fixed this test by ensuring the pterm is the same, but the checkpoint is ahead. Also added assertion that replication is not started with the exact ahead checkpoint instead of only processLatestReivedCheckpoint. Tests already exist for ahead primary term "testShardAlreadyReplicating_HigherPrimaryTermReceived". Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
❌ Gradle check result for b8877bf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for b8877bf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #13248 +/- ##
============================================
+ Coverage 71.42% 71.51% +0.09%
- Complexity 59978 60707 +729
============================================
Files 4985 5040 +55
Lines 282275 285432 +3157
Branches 40946 41335 +389
============================================
+ Hits 201603 204119 +2516
- Misses 63999 64438 +439
- Partials 16673 16875 +202 ☔ View full report in Codecov by Sentry. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-13248-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1fcb79de07498005fea9a9e6148ecdf44f484e7b
# Push it to GitHub
git push --set-upstream origin backport/backport-13248-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…eplicating (opensearch-project#13248) This test is flaky because it is incorrectly passing a checkpoint with a higher primary term on the second invocation. This will cancel the first replication and start another. The test sometimes passes because it is only asserting on processLatestReceivedCheckpoint. If the cancellation quickly completes before attempting second replication event the test will fail, otherwise it will pass. Fixed this test by ensuring the pterm is the same, but the checkpoint is ahead. Also added assertion that replication is not started with the exact ahead checkpoint instead of only processLatestReivedCheckpoint. Tests already exist for ahead primary term "testShardAlreadyReplicating_HigherPrimaryTermReceived". Signed-off-by: Marc Handalian <marc.handalian@gmail.com> (cherry picked from commit 1fcb79d)
…eplicating (#13248) (#13265) This test is flaky because it is incorrectly passing a checkpoint with a higher primary term on the second invocation. This will cancel the first replication and start another. The test sometimes passes because it is only asserting on processLatestReceivedCheckpoint. If the cancellation quickly completes before attempting second replication event the test will fail, otherwise it will pass. Fixed this test by ensuring the pterm is the same, but the checkpoint is ahead. Also added assertion that replication is not started with the exact ahead checkpoint instead of only processLatestReivedCheckpoint. Tests already exist for ahead primary term "testShardAlreadyReplicating_HigherPrimaryTermReceived". Signed-off-by: Marc Handalian <marc.handalian@gmail.com> (cherry picked from commit 1fcb79d)
. |
Description
This test is flaky because it is incorrectly passing a checkpoint with a higher primary term on the second invocation. This will cancel the first replication and start another. The test sometimes passes because it is only asserting on processLatestReceivedCheckpoint. If the cancellation quickly completes before attempting second replication event the test will fail, otherwise it will pass.
Fixed this test by ensuring the pterm is the same, but the checkpoint is ahead (higher sis verison). Also added assertion that replication is not started with the exact ahead checkpoint instead of only processLatestReivedCheckpoint. Tests already exist for ahead primary term "testShardAlreadyReplicating_HigherPrimaryTermReceived".
Related Issues
Resolves #8928
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.