Skip to content

Commit 50b617f

Browse files
committed
Remove global checkpoint assertion in index shard
Due to races, this assertion in index shard can be wrong. This commit removes that assertion and adjusts the explanatory comment.
1 parent 977016b commit 50b617f

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1523,20 +1523,20 @@ public void updateGlobalCheckpointOnReplica(final long globalCheckpoint) {
15231523
verifyReplicationTarget();
15241524
final SequenceNumbersService seqNoService = getEngine().seqNoService();
15251525
final long localCheckpoint = seqNoService.getLocalCheckpoint();
1526-
if (globalCheckpoint <= localCheckpoint) {
1527-
seqNoService.updateGlobalCheckpointOnReplica(globalCheckpoint);
1528-
} else {
1526+
if (globalCheckpoint > localCheckpoint) {
15291527
/*
15301528
* This can happen during recovery when the shard has started its engine but recovery is not finalized and is receiving global
1531-
* checkpoint updates from in-flight operations. However, since this shard is not yet contributing to calculating the global
1532-
* checkpoint, it can be the case that the global checkpoint update from the primary is ahead of the local checkpoint on this
1533-
* shard. In this case, we ignore the global checkpoint update. This should only happen if we are in the translog stage of
1534-
* recovery. Prior to this, the engine is not opened and this shard will not receive global checkpoint updates, and after this
1535-
* the shard will be contributing to calculations of the the global checkpoint.
1529+
* checkpoint updates. However, since this shard is not yet contributing to calculating the global checkpoint, it can be the
1530+
* case that the global checkpoint update from the primary is ahead of the local checkpoint on this shard. In this case, we
1531+
* ignore the global checkpoint update. This can happen if we are in the translog stage of recovery. Prior to this, the engine
1532+
* is not opened and this shard will not receive global checkpoint updates, and after this the shard will be contributing to
1533+
* calculations of the the global checkpoint. However, we can not assert that we are in the translog stage of recovery here as
1534+
* while the global checkpoint update may have emanated from the primary when we were in that state, we could subsequently move
1535+
* to recovery finalization, or even finished recovery before the update arrives here.
15361536
*/
1537-
assert recoveryState().getStage() == RecoveryState.Stage.TRANSLOG
1538-
: "expected recovery stage [" + RecoveryState.Stage.TRANSLOG + "] but was [" + recoveryState().getStage() + "]";
1537+
return;
15391538
}
1539+
seqNoService.updateGlobalCheckpointOnReplica(globalCheckpoint);
15401540
}
15411541

15421542
/**

0 commit comments

Comments
 (0)