Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Update PrimaryShardAllocator to prefer replicas with higher replication checkpoint #4041

Merged

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Jul 29, 2022

Signed-off-by: Suraj Singh surajrider@gmail.com

Description

Update the PrimaryShardAllocator (PSA) to include highest ReplicationCheckpoint (RC) info for evaluating shard to be promoted as primary.

Today, PSA orders shard copies from nodes holding active copy of unassigned primary shard and orders using below comparators and selects the first the first shard (satisfying existing allocation deciders).

  1. Readable shards with no exception on index open
  2. Existing primary. Shards which are marked primary are preferred over replica copies.

As part of this change, we are adding another comparator (runs after above two) which orders shard copies based on highest RC (a > b or a == b && c > d here a, b are primary terms & c,d are segment info version fields of RC. Please check #4112 for details). This is to ensure in case of failover master always chooses the shard copy which is furthest ahead (after satisfying first 2 comparators) and keeps the translog operations to rerun at minimum.

Please note, there are two places where shard promotion logic is used. This PR handles the PrimaryShardAllocator. The other one is RoutingNodes.failShard which will be taken up in follow up PR. Please check #3988 for more details.

Pending

Integration tests which needs #3989 changes (performs InternalEngine <-> NRTEngineReplication) in main.

Issues Resolved

#3988

Related

#2212
#4112

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dreamer-89 dreamer-89 requested review from a team and reta as code owners July 29, 2022 00:21
@dreamer-89 dreamer-89 marked this pull request as draft July 29, 2022 00:21
@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch 8 times, most recently from ebd18e5 to 5b5b36d Compare August 3, 2022 21:49
@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch from 5b5b36d to f0e264c Compare August 4, 2022 00:08
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

Merging #4041 (1f07d13) into main (6993ac9) will decrease coverage by 0.06%.
The diff coverage is 62.85%.

@@             Coverage Diff              @@
##               main    #4041      +/-   ##
============================================
- Coverage     70.68%   70.61%   -0.07%     
+ Complexity    57125    57041      -84     
============================================
  Files          4603     4603              
  Lines        274535   274563      +28     
  Branches      40209    40216       +7     
============================================
- Hits         194058   193886     -172     
- Misses        64196    64355     +159     
- Partials      16281    16322      +41     
Impacted Files Coverage Δ
...ateway/TransportNodesListGatewayStartedShards.java 51.67% <47.61%> (-1.36%) ⬇️
.../opensearch/test/gateway/TestGatewayAllocator.java 94.00% <71.42%> (-3.73%) ⬇️
.../org/opensearch/gateway/PrimaryShardAllocator.java 75.24% <100.00%> (+0.50%) ⬆️
.../replication/checkpoint/ReplicationCheckpoint.java 56.81% <100.00%> (+3.32%) ⬆️
...java/org/opensearch/client/indices/DataStream.java 0.00% <0.00%> (-76.09%) ⬇️
.../opensearch/client/indices/CloseIndexResponse.java 17.50% <0.00%> (-73.75%) ⬇️
...a/org/opensearch/client/cluster/SniffModeInfo.java 0.00% <0.00%> (-52.95%) ⬇️
.../java/org/opensearch/node/NodeClosedException.java 50.00% <0.00%> (-50.00%) ⬇️
...ch/transport/ReceiveTimeoutTransportException.java 50.00% <0.00%> (-50.00%) ⬇️
...cluster/coordination/PendingClusterStateStats.java 20.00% <0.00%> (-48.00%) ⬇️
... and 475 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@@ -399,6 +430,14 @@ public void writeTo(StreamOutput out) throws IOException {
} else {
out.writeBoolean(false);
}
if (out.getVersion().onOrAfter(Version.V_3_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make sure we update this to V_2_3_0 when this is backported?

@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch from dd79408 to de07dac Compare August 9, 2022 18:24
@dreamer-89
Copy link
Member Author

dreamer-89 commented Aug 9, 2022

@Bukhtawar @andrross @mch2 : Addressed the review comments, please have a look.

@dreamer-89
Copy link
Member Author

dreamer-89 commented Aug 9, 2022

Trying to write an IT which simulate no active shard copies (to ensure RoutingNodes#failShard doesn't select any random active replica) during primary failure but at the same time shard copies should be part of in-sync allocation ids. This is not a blocker for this PR.

https://gist.github.com/dreamer-89/9e6a781e57dbf1008c97034103b6a9a8

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

Failure looks unrelated and not reproducible locally. Refiring

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone" -Dtests.seed=18F9CB5CD66407EA -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms -Dtests.timezone=Europe/Sofia -Druntime.java=17

org.opensearch.indices.replication.SegmentReplicationTargetServiceTests > testReplicationOnDone FAILED
    org.mockito.exceptions.verification.TooFewActualInvocations: 
    segmentReplicationTargetService.onNewCheckpoint(
        ReplicationCheckpoint{shardId=[index][0], primaryTerm=94, segmentsGen=3, seqNo=-1, version=9},
        <any>
    );
    Wanted 2 times:
    -> at org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:236)
    But was 1 time:
    -> at org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:230)
        at __randomizedtesting.SeedInfo.seed([18F9CB5CD66407EA:2371B911C98E848]:0)
        at app//org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:236)

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

@Bukhtawar @andrross @mch2 : Addressed the review comments, please have a look.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dreamer-89 dreamer-89 merged commit d308a29 into opensearch-project:main Aug 18, 2022
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Aug 18, 2022
… with higher replication checkpoint (opensearch-project#4041)

* [Segment Replication] Update PrimaryShardAllocator to prefer replicas having higher replication checkpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use empty replication checkpoint to avoid NPE

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update NodeGatewayStartedShards to optionally wire in/out ReplicationCheckpoint field

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use default replication checkpoint causing EOF errors on empty checkpoint

* Add indexSettings to GatewayAllocator to allow ReplicationCheckpoint comparator only for segrep enabled indices

* Add unit tests for primary term first replica promotion & comparator fix

* Fix NPE on empty IndexMetadata

* Remove settings from AllocationService and directly inject in GatewayAllocator

* Add more unit tests and minor code clean up

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments & integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix comparator on null ReplicationCheckpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>
dreamer-89 added a commit that referenced this pull request Aug 18, 2022
… with higher replication checkpoint (#4041) (#4252)

* [Segment Replication] Update PrimaryShardAllocator to prefer replicas having higher replication checkpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use empty replication checkpoint to avoid NPE

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update NodeGatewayStartedShards to optionally wire in/out ReplicationCheckpoint field

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use default replication checkpoint causing EOF errors on empty checkpoint

* Add indexSettings to GatewayAllocator to allow ReplicationCheckpoint comparator only for segrep enabled indices

* Add unit tests for primary term first replica promotion & comparator fix

* Fix NPE on empty IndexMetadata

* Remove settings from AllocationService and directly inject in GatewayAllocator

* Add more unit tests and minor code clean up

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments & integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix comparator on null ReplicationCheckpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants