No-op replication for primary term validation with NRTSegRep #4127

ashking94 · 2022-08-04T14:15:43Z

Signed-off-by: Ashish Singh ssashish@amazon.com

Description

As part of implementing #3706, this is the initial commit that does the following -

Introduces an abstraction for developing No Op replication (for primary term validation) on top of NRT segment replication.
Implements Engine (NRTReplicationNoOpEngine) for No-op replication use case where the calls to replica does not persist any operation onto the replicas. There is in-memory storage, however, of the last seq no seen. This is to handle recovery. The translog manager being used is NoOpTranslogManager that does not perform any operation.
Follow things are working -
- Primary term validation during the indexing/delete/update/bulk calls.
- Peer recovery of replicas are working fine. Currently, the replica is brought to speed upto the last successful commit on Primary.

Following items have to be handled, should be followed with PRs -

Request payload is not required to be sent across the wire now in the in-sync replication call. We have to fix this. [Remote Store] With no-op replication, remove the payload to the in-sync calls to replicas #4229
internal:index/shard/recovery/translog_ops action which brings the translog from primary and performs indexing on replica is not required for replica where segrep and remote store are enabled. The reason is that translog on remote store would be the source of truth where only the assuming primary would be publishing the translogs. [Remote Store] Skip translog replay in Recovery code for the replica's recovery when translogs stored remotely. #4230
Also, the translog replay can probably be replaced with Segment replication so that all operations until the last refresh on primary is made available to the replica. [Remote Store] Introduce segment replication as prefinal step during recovery of replicas #4231
During the recovery or during Engine bootstrap, the translog generation and checkpoint files are getting created on local disk. This needs to be fixed so that there is no reliance on local disk. [Remote Store] Stop creating translog generation and checkpoint files on replicas where translog is stored remotely #4232
Recovery is a multistep workflow (or step function) of which one of the steps/flow is where translog replay is carried out. Currently, replay translog is a no-op on the replica. For the recovery to complete successfully (and code to work), have overridden the getPersistedLocalCheckpoint method to return the seqNo of the last translog which was indexed. Essentially this is dummy code which we should remove by skipping Replay translog step and directly going with the finalize step.
Currently we have limited UTs and no ITs. Following PRs would include more ITs and UTs. Also, regarding the ITs, will try to use [Segment Replication] Add test support for segrep integration tests #3818.

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2022-08-04T14:40:07Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1442/
CommitID: 2356e2dc4a50e4248eecbe3caff99e02cc5cc4d9

github-actions · 2022-08-04T15:09:58Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1443/
CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f

github-actions · 2022-08-04T15:44:12Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1445/
CommitID: 12edd37abbf834bf49a94fdfbab114a028a62e9f

github-actions · 2022-08-04T20:13:57Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1458/
CommitID: 396f13b6cb755294a728ac94ffd1afeee69bc7dd

Bukhtawar · 2022-08-05T05:09:57Z

server/src/test/java/org/opensearch/index/shard/IndexShardTests.java

-     * @param checkpointPublisher               Segment Replication Checkpoint Publisher to publish checkpoint
+     *
+     * @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint


nit: Please remove all unintended formatting changes.

Bukhtawar

Do you think we could add tests to exercise the new code path?
Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant?
Do you think the below change in TransportShardBulkAction might work

@Override
    protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
        ActionListener.completeWith(listener, () -> {
            Translog.Location location = new Translog.Location(0,0,0);
            if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
                replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
            } else {
                location = performOnReplica(request, replica);
            }
            return new WriteReplicaResult<>(request, location, null, replica, logger);
        });
    }

ashking94 · 2022-08-05T14:39:10Z

Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change in TransportShardBulkAction might work
@Override
    protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
        ActionListener.completeWith(listener, () -> {
            Translog.Location location = new Translog.Location(0,0,0);
            if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
                replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
            } else {
                location = performOnReplica(request, replica);
            }
            return new WriteReplicaResult<>(request, location, null, replica, logger);
        });
    }

This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the performOnReplica method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.

ashking94 · 2022-08-05T14:40:31Z

UTs and ITs would follow soon.

ashking94 · 2022-08-05T14:54:23Z

cc @mch2 @dreamer-89 @sachinpkale

github-actions · 2022-08-05T15:43:19Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1485/
CommitID: 7344aaa0ea325fcdf0c0fad70a0b5418d80736bd

github-actions · 2022-08-05T16:08:53Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1486/
CommitID: b420b3de1255daf292866206740ba7d5d3cd5f64

github-actions · 2022-08-08T07:30:48Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1554/
CommitID: 830a9380212cff35a8aeecc00707a9754f7a3e63

github-actions · 2022-08-08T08:52:03Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1555/
CommitID: ebc57ca71a57ecac72845259b1f50dc2ef61f1a0

codecov-commenter · 2022-08-08T08:53:56Z

Codecov Report

Merging #4127 (5f93b80) into main (5f2e66b) will decrease coverage by 0.13%.
The diff coverage is 75.42%.

@@             Coverage Diff              @@
##               main    #4127      +/-   ##
============================================
- Coverage     70.78%   70.65%   -0.14%     
+ Complexity    57218    57104     -114     
============================================
  Files          4605     4607       +2     
  Lines        274695   274730      +35     
  Branches      40228    40228              
============================================
- Hits         194441   194098     -343     
- Misses        63955    64381     +426     
+ Partials      16299    16251      -48

Impacted Files	Coverage Δ
...main/java/org/opensearch/common/lucene/Lucene.java	`66.02% <ø> (-1.28%)`	⬇️
...index/codec/PerFieldMappingPostingFormatCodec.java	`64.28% <ø> (ø)`
...arch/index/engine/NRTReplicationReaderManager.java	`86.95% <ø> (ø)`
...s/replication/SegmentReplicationSourceHandler.java	`87.71% <ø> (-0.81%)`	⬇️
...va/org/opensearch/index/engine/EngineTestCase.java	`86.46% <66.66%> (+0.52%)`	⬆️
...arch/index/engine/NRTReplicationEngineFactory.java	`77.77% <71.42%> (-22.23%)`	⬇️
...nsearch/index/engine/NRTReplicationNoOpEngine.java	`71.42% <71.42%> (ø)`
...rch/index/engine/AbstractNRTReplicationEngine.java	`73.50% <73.50%> (ø)`
.../opensearch/index/engine/NRTReplicationEngine.java	`80.95% <75.00%> (+5.75%)`	⬆️
...in/java/org/opensearch/index/shard/IndexShard.java	`68.92% <80.00%> (-0.40%)`	⬇️
... and 516 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

server/src/main/java/org/opensearch/common/util/FeatureFlags.java

github-actions · 2022-08-16T08:26:13Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1763/
CommitID: e9f6d56670e50e2cd8ccf85fe979c6a102e939d3

Signed-off-by: Ashish Singh <ssashish@amazon.com>

Replica Recovery is Working along with no-replication call for primary term validation Signed-off-by: Ashish Singh <ssashish@amazon.com>

Signed-off-by: Ashish Singh <ssashish@amazon.com>

This reverts commit ebc57ca71a57ecac72845259b1f50dc2ef61f1a0. Signed-off-by: Ashish Singh <ssashish@amazon.com>

Signed-off-by: Ashish Singh <ssashish@amazon.com>

github-actions · 2022-08-16T09:05:57Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1764/
CommitID: 65fe29c

github-actions · 2022-08-16T09:15:59Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1765/
CommitID: 5f93b80

sachinpkale · 2022-08-16T15:34:12Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+     * @return index result.
+     */
+    @Override
+    public IndexResult index(Index index) throws IOException {


If these methods (index, delete and noOp) are delegating to super, do we need to override them?

Currently we are delegating to parent. In future, the plan is to throw Exceptions from these 3 methods.

Why don't we override them when we want to throw the exception? I understand that it does not make any difference in functionality but is creating confusion.

Why throw exceptions from these? Can we rather add assertions in existing Engine and avoid a new one?

sachinpkale · 2022-08-16T15:49:30Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+
+    @Override
+    public long getLastSyncedGlobalCheckpoint() {
+        return -1;


Generally, when no ops are performed, we return -1. Currently I have kept it to return -1 as this value is generally returned by the translog manager, and here we have no op translog manager hooked with this engine. In case if replica to primary promotion fails, will make appropriate changes.

Can we add javadoc with the same description?

sachinpkale · 2022-08-16T16:10:48Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+    /**
+     * This method tracks the maximum sequence number of the request that has been given for indexing to this replica.
+     * Currently, the recovery process involves replaying translog operation on the replica by the primary. For recovery
+     * step to finish, and finalize step to kick in, this method should return expected value. Hence it has been overridden.


Would expected value be equal to maxSeqNo always?

so, what happens currently (in translog replay from primary during recovery) is that the applyIndexOperation is performed. This in the doc replication world would sync to translog before acking back and hence updating the persisted sequence number in local checkpoint tracker. AFAIK, this is expected. Since, we are freeing translog on replicas, we have to have this interim solution for recovery to work and when we tweak the recovery for no-op replication use case, the method might become unnecessary on replica's engine.

ashking94

responded to @sachinpkale's comments.

ashking94 · 2022-08-05T05:36:32Z

server/src/test/java/org/opensearch/index/shard/IndexShardTests.java

-     * @param checkpointPublisher               Segment Replication Checkpoint Publisher to publish checkpoint
+     *
+     * @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint


server/src/main/java/org/opensearch/common/util/FeatureFlags.java

ashking94 · 2022-08-16T16:20:40Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+
+    @Override
+    public long getLastSyncedGlobalCheckpoint() {
+        return -1;


Generally, when no ops are performed, we return -1. Currently I have kept it to return -1 as this value is generally returned by the translog manager, and here we have no op translog manager hooked with this engine. In case if replica to primary promotion fails, will make appropriate changes.

ashking94 · 2022-08-16T16:23:17Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+     * @return index result.
+     */
+    @Override
+    public IndexResult index(Index index) throws IOException {


Currently we are delegating to parent. In future, the plan is to throw Exceptions from these 3 methods.

ashking94 · 2022-08-16T16:31:23Z

server/src/main/java/org/opensearch/index/engine/NRTReplicationNoOpEngine.java

+    /**
+     * This method tracks the maximum sequence number of the request that has been given for indexing to this replica.
+     * Currently, the recovery process involves replaying translog operation on the replica by the primary. For recovery
+     * step to finish, and finalize step to kick in, this method should return expected value. Hence it has been overridden.


so, what happens currently (in translog replay from primary during recovery) is that the applyIndexOperation is performed. This in the doc replication world would sync to translog before acking back and hence updating the persisted sequence number in local checkpoint tracker. AFAIK, this is expected. Since, we are freeing translog on replicas, we have to have this interim solution for recovery to work and when we tweak the recovery for no-op replication use case, the method might become unnecessary on replica's engine.

Bukhtawar · 2022-08-16T17:57:06Z

Do you think we could add tests to exercise the new code path? Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant? Do you think the below change in TransportShardBulkAction might work
@Override
    protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
        ActionListener.completeWith(listener, () -> {
            Translog.Location location = new Translog.Location(0,0,0);
            if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
                replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
            } else {
                location = performOnReplica(request, replica);
            }
            return new WriteReplicaResult<>(request, location, null, replica, logger);
        });
    }
This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the performOnReplica method. And this is the plan as well (have mentioned in the PR description). We also need to see later what changes would be required for primary-primary recovery, and hence we can make this change in Recovery finally then.

So if I understand this correct we can totally avoid performOnReplica method then we don't really need a NRTReplicationNoOpEngine as call to any engine will be short-circuited and new engine changes would be effectively dead-code.
I would prefer avoiding any engine changes and rather use assertions in the existing engine to ensure there are no calls made to the engine if the mode if replica and remote translogs are enabled.
"Best code is no code" :)

The way I would approach this is starting from refactoring the recovery code to see if the eventual state can get rid of performOnReplica and work backwards.
If that's something that is not possible I would consider using a gating mechanism like a feature flag or even a feature branch to avoid breaking existing feature sets and develop NoOp replication in isolation till we can integrate the eventual solution incrementally into mainline rather than building abstraction that would eventually be dead code

mch2 · 2022-08-17T01:18:49Z

server/src/main/java/org/opensearch/index/engine/AbstractNRTReplicationEngine.java

+        return noOpResult;
+    }
+
+    protected abstract TranslogManager createTranslogManager(String translogUUID, SetOnce<TranslogManager> translogManager)


Can we delegate the creation of the TranslogManager to a factory and inject that rather than these 3 NRT engine types?

So, had to create an abstract engine so that the seg rep can still work for the newer engine. And since there are some other engine methods that had to be overridden, there came the need to create a newer engine altogether.
Around the creation of TranslogManager, this should be doable.

ashking94 · 2022-08-17T10:41:47Z

Revisiting the recovery code to allow for the sync replication call to return before reaching engine. cc @Bukhtawar @mch2

github-actions · 2022-08-18T09:21:24Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1879/
CommitID: 3e325be

github-actions · 2022-08-19T18:28:23Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1943/
CommitID: 5f93b80

ashking94 force-pushed the 3706-1 branch from 2356e2d to 12edd37 Compare August 4, 2022 14:43

Bukhtawar reviewed Aug 5, 2022

View reviewed changes

ashking94 force-pushed the 3706-1 branch 2 times, most recently from 7344aaa to b420b3d Compare August 5, 2022 15:42

Bukhtawar mentioned this pull request Aug 12, 2022

Introduce TranslogFactory for Local/Remote Translog support #4172

Merged

5 tasks

Bukhtawar reviewed Aug 12, 2022

View reviewed changes

server/src/main/java/org/opensearch/common/util/FeatureFlags.java Outdated Show resolved Hide resolved

ashking94 added 7 commits August 16, 2022 14:03

Introduced abstraction layer in NRTReplicationEngine for extension

6a81186

Signed-off-by: Ashish Singh <ssashish@amazon.com>

[Remote Store] Added NRTReplicationNoOpEngine with NoOpTranslogManager

3a243cc

Replica Recovery is Working along with no-replication call for primary term validation Signed-off-by: Ashish Singh <ssashish@amazon.com>

Removes unwanted formatting changes from prev commit

ef95d1c

Signed-off-by: Ashish Singh <ssashish@amazon.com>

Adds rationale behind new Engine as comments

3012063

Signed-off-by: Ashish Singh <ssashish@amazon.com>

Adds NoOp code behind feature flag

ac2ebc0

Signed-off-by: Ashish Singh <ssashish@amazon.com>

Adds UTs for NRTReplicationNoOpEngine

974ba60

Signed-off-by: Ashish Singh <ssashish@amazon.com>

Revert "Adds NoOp code behind feature flag"

65fe29c

This reverts commit ebc57ca71a57ecac72845259b1f50dc2ef61f1a0. Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 force-pushed the 3706-1 branch from e9f6d56 to 65fe29c Compare August 16, 2022 08:40

ashking94 marked this pull request as ready for review August 16, 2022 08:41

ashking94 requested a review from a team as a code owner August 16, 2022 08:41

ashking94 requested a review from reta as a code owner August 16, 2022 08:41

NoOp engine enablement on remote translog store enable status

5f93b80

Signed-off-by: Ashish Singh <ssashish@amazon.com>

sachinpkale reviewed Aug 16, 2022

View reviewed changes

ashking94 commented Aug 16, 2022

View reviewed changes

sachinpkale approved these changes Aug 16, 2022

View reviewed changes

mch2 reviewed Aug 17, 2022

View reviewed changes

ashking94 force-pushed the 3706-1 branch from 3e325be to 5f93b80 Compare August 19, 2022 17:48

ashking94 closed this Jan 6, 2023

No-op replication for primary term validation with NRTSegRep #4127

No-op replication for primary term validation with NRTSegRep #4127

Conversation

ashking94 commented Aug 4, 2022 • edited Loading

Description

Issues Resolved

Check List

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bukhtawar left a comment • edited Loading

Choose a reason for hiding this comment

ashking94 commented Aug 5, 2022 • edited Loading

ashking94 commented Aug 5, 2022

ashking94 commented Aug 5, 2022

github-actions bot commented Aug 5, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 5, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 8, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 8, 2022

Gradle Check (Jenkins) Run Completed with:

codecov-commenter commented Aug 8, 2022 • edited Loading

Codecov Report

github-actions bot commented Aug 16, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 16, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 16, 2022

Gradle Check (Jenkins) Run Completed with:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashking94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bukhtawar commented Aug 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashking94 commented Aug 17, 2022

github-actions bot commented Aug 18, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 19, 2022

Gradle Check (Jenkins) Run Completed with:

ashking94 commented Aug 4, 2022 •

edited

Loading

Bukhtawar left a comment •

edited

Loading

ashking94 commented Aug 5, 2022 •

edited

Loading

codecov-commenter commented Aug 8, 2022 •

edited

Loading

Bukhtawar commented Aug 16, 2022 •

edited

Loading