-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support shard promotion with Segment Replication. #4135
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4135 +/- ##
============================================
- Coverage 70.65% 70.64% -0.01%
- Complexity 57075 57145 +70
============================================
Files 4606 4606
Lines 274706 274737 +31
Branches 40228 40231 +3
============================================
- Hits 194103 194099 -4
- Misses 64280 64374 +94
+ Partials 16323 16264 -59
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Outdated
Show resolved
Hide resolved
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Outdated
Show resolved
Hide resolved
I've added a commit to this ensuring cancelling primary allocation succeeds and that the replica is promoted & primary recreated as a replica. In testing that I found we were failing to publish a replication checkpoint if the primary flushed during close. That is now fixed, the shard must be open for us to publish the replication cp. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/engine/NRTReplicationEngine.java
Outdated
Show resolved
Hide resolved
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
This change adds basic failover support with segment replication. Once selected, a replica will commit and reopen a writeable engine. Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-4135-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f65e02d1b910bd0a1990868bfa5d12ba829bbbd5
# Push it to GitHub
git push --set-upstream origin backport/backport-4135-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
…nsearch-project#4135) * Support shard promotion with Segment Replication. This change adds basic failover support with segment replication. Once selected, a replica will commit and reopen a writeable engine. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add check to ensure a closed shard does not publish checkpoints. Signed-off-by: Marc Handalian <handalm@amazon.com> * Clean up in SegmentReplicationIT. Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix merge conflict. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com>
…) (#4325) * Support shard promotion with Segment Replication. This change adds basic failover support with segment replication. Once selected, a replica will commit and reopen a writeable engine. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add check to ensure a closed shard does not publish checkpoints. Signed-off-by: Marc Handalian <handalm@amazon.com> * Clean up in SegmentReplicationIT. Signed-off-by: Marc Handalian <handalm@amazon.com> * PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix merge conflict. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian handalm@amazon.com
Description
This change adds basic failover support with segment replication. Once selected, a replica will commit its SegmentInfos and reopen a writeable engine. The replica will also remove all other commits so that this commit is selected when the writeable engine is opened. It is possible that this commit is not considered 'safe' by the primary, meaning its max seqNo is higher than the global cp. While an edge case, we never want replicas to reindex with segment replication enabled, so if the global cp has not been updated yet we do not want to revert to a safe commit. This change also updates how SegmentReplicationCheckpointPublisher is wired up within IndexShard so that once promoted the new primary can publish checkpoints.
This PR does not handle edge cases of promotion while there are ongoing replication events, that will be covered in a separate issue.
Issues Resolved
closes #3989
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.