-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No-op replication for primary term validation with NRTSegRep #4127
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
* @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint | ||
* | ||
* @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please remove all unintended formatting changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could add tests to exercise the new code path?
Will review in details once we add tests and resolve conflicts. At this point the concern I have is this change is already hitting the Engine
, can we not return pre-emptively shortly after hitting the replica once we have validated the primary term invariant?
Do you think the below change in TransportShardBulkAction
might work
@Override
protected void dispatchedShardOperationOnReplica(BulkShardRequest request, IndexShard replica, ActionListener<ReplicaResult> listener) {
ActionListener.completeWith(listener, () -> {
Translog.Location location = new Translog.Location(0,0,0);
if (replica.indexSettings().isRemoteStoreEnabled() && replica.indexSettings().isSegRepEnabled()) {
replica.ensureWriteAllowed(Engine.Operation.Origin.REPLICA);
} else {
location = performOnReplica(request, replica);
}
return new WriteReplicaResult<>(request, location, null, replica, logger);
});
}
This is something that I have explored. Currently when a shard recovery happens, one of the step involves replaying translog. And for recovery to complete, at the end of replay translog operation, it should return the expected value which is the highest sequence number seen during the replay translog step. When we refactor the recovery code and skip translog replay and directly jump to finalize step, we can probably totally avoid the |
UTs and ITs would follow soon. |
7344aaa
to
b420b3d
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4127 +/- ##
============================================
- Coverage 70.78% 70.65% -0.14%
+ Complexity 57218 57104 -114
============================================
Files 4605 4607 +2
Lines 274695 274730 +35
Branches 40228 40228
============================================
- Hits 194441 194098 -343
- Misses 63955 64381 +426
+ Partials 16299 16251 -48
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
server/src/main/java/org/opensearch/common/util/FeatureFlags.java
Outdated
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Replica Recovery is Working along with no-replication call for primary term validation Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
This reverts commit ebc57ca71a57ecac72845259b1f50dc2ef61f1a0. Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
* @return index result. | ||
*/ | ||
@Override | ||
public IndexResult index(Index index) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these methods (index
, delete
and noOp
) are delegating to super, do we need to override them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are delegating to parent. In future, the plan is to throw Exceptions from these 3 methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we override them when we want to throw the exception? I understand that it does not make any difference in functionality but is creating confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why throw exceptions from these? Can we rather add assertions in existing Engine and avoid a new one?
|
||
@Override | ||
public long getLastSyncedGlobalCheckpoint() { | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, when no ops are performed, we return -1. Currently I have kept it to return -1 as this value is generally returned by the translog manager, and here we have no op translog manager hooked with this engine. In case if replica to primary promotion fails, will make appropriate changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add javadoc with the same description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
/** | ||
* This method tracks the maximum sequence number of the request that has been given for indexing to this replica. | ||
* Currently, the recovery process involves replaying translog operation on the replica by the primary. For recovery | ||
* step to finish, and finalize step to kick in, this method should return expected value. Hence it has been overridden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would expected value
be equal to maxSeqNo
always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, what happens currently (in translog replay from primary during recovery) is that the applyIndexOperation is performed. This in the doc replication world would sync to translog before acking back and hence updating the persisted sequence number in local checkpoint tracker. AFAIK, this is expected. Since, we are freeing translog on replicas, we have to have this interim solution for recovery to work and when we tweak the recovery for no-op replication use case, the method might become unnecessary on replica's engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responded to @sachinpkale's comments.
* @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint | ||
* | ||
* @param checkpointPublisher Segment Replication Checkpoint Publisher to publish checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack!
server/src/main/java/org/opensearch/common/util/FeatureFlags.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public long getLastSyncedGlobalCheckpoint() { | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, when no ops are performed, we return -1. Currently I have kept it to return -1 as this value is generally returned by the translog manager, and here we have no op translog manager hooked with this engine. In case if replica to primary promotion fails, will make appropriate changes.
* @return index result. | ||
*/ | ||
@Override | ||
public IndexResult index(Index index) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are delegating to parent. In future, the plan is to throw Exceptions from these 3 methods.
/** | ||
* This method tracks the maximum sequence number of the request that has been given for indexing to this replica. | ||
* Currently, the recovery process involves replaying translog operation on the replica by the primary. For recovery | ||
* step to finish, and finalize step to kick in, this method should return expected value. Hence it has been overridden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, what happens currently (in translog replay from primary during recovery) is that the applyIndexOperation is performed. This in the doc replication world would sync to translog before acking back and hence updating the persisted sequence number in local checkpoint tracker. AFAIK, this is expected. Since, we are freeing translog on replicas, we have to have this interim solution for recovery to work and when we tweak the recovery for no-op replication use case, the method might become unnecessary on replica's engine.
So if I understand this correct we can totally avoid The way I would approach this is starting from refactoring the recovery code to see if the eventual state can get rid of performOnReplica and work backwards. |
return noOpResult; | ||
} | ||
|
||
protected abstract TranslogManager createTranslogManager(String translogUUID, SetOnce<TranslogManager> translogManager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delegate the creation of the TranslogManager to a factory and inject that rather than these 3 NRT engine types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, had to create an abstract engine so that the seg rep can still work for the newer engine. And since there are some other engine methods that had to be overridden, there came the need to create a newer engine altogether.
Around the creation of TranslogManager, this should be doable.
Revisiting the recovery code to allow for the sync replication call to return before reaching engine. cc @Bukhtawar @mch2 |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ashish Singh ssashish@amazon.com
Description
As part of implementing #3706, this is the initial commit that does the following -
NRTReplicationNoOpEngine
) for No-op replication use case where the calls to replica does not persist any operation onto the replicas. There is in-memory storage, however, of the last seq no seen. This is to handle recovery. The translog manager being used isNoOpTranslogManager
that does not perform any operation.Following items have to be handled, should be followed with PRs -
internal:index/shard/recovery/translog_ops
action which brings the translog from primary and performs indexing on replica is not required for replica where segrep and remote store are enabled. The reason is that translog on remote store would be the source of truth where only the assuming primary would be publishing the translogs. [Remote Store] Skip translog replay in Recovery code for the replica's recovery when translogs stored remotely. #4230getPersistedLocalCheckpoint
method to return the seqNo of the last translog which was indexed. Essentially this is dummy code which we should remove by skipping Replay translog step and directly going with the finalize step.Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.