-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep commits and translog up to the global checkpoint #27606
Conversation
We need to keep index commits and translog operations up to the current global checkpoint to allow us to throw away unsafe operations and increase the operation-based recovery chance. This is achieved by a new index deletion policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnhatn - I left some small feedback points.
* @return a list of index commits that are not deleted by this policy. | ||
*/ | ||
List<IndexCommit> onCommit(List<? extends IndexCommit> commits) throws IOException; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface clearly came from IndexDeletionPolicy
but I think it's now different enough that it's worth breaking the link entirely and coming up with more descriptive names. The onInit()
and onCommit()
methods of IndexDeletionPolicy
return void
but here we're returning a list of the kept commits. Additionally, I think all the implementations just delegate onInit()
to onCommit()
(except KeepUntilGlobalCheckpointDeletionPolicy
that special-cases an empty argument).
import java.util.List; | ||
|
||
/** | ||
* An {@link IndexDeletionPolicy} that deletes unneeded index commits, and returns index commits are not deleted by this policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an IndexDeletionPolicy
any more!
case OPEN_INDEX_AND_TRANSLOG: | ||
final long globalCheckpoint = Translog.readGlobalCheckpoint(engineConfig.getTranslogConfig().getTranslogPath()); | ||
seqNoStats = store.loadSeqNoStats(globalCheckpoint); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just return store.loadSeqNoStats()
here rather than break
ing and return
ing at the bottom. Not sure if that's against our style tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
/** | ||
* An {@link ESIndexDeletionPolicy} that deletes index commits that are not required for recovery. | ||
* In particular, this policy will delete index commits whose max sequence number is smaller (or equal) than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/smaller (or equal) than/at most/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
commit(translog, generation); | ||
} | ||
long lastGen = randomLongBetween(1, translog.currentFileGeneration()); | ||
commit(translog, randomLongBetween(1, lastGen), lastGen); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: }
should be on its own line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
final List<IndexCommit> keptCommits = new ArrayList<>(); | ||
final int keptPosition = indexOfKeptCommits(commits); | ||
final List<Integer> duplicateIndexes = indexesOfDuplicateCommits(commits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only call this collection's contains()
method so maybe a HashSet<Integer>
would be better? Not sure if it gets big enough to make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect to have zero or one element in a list, there should be no difference. However, using Set makes more sense here; I updated. Thank you.
@@ -113,27 +113,10 @@ public void testRecoveryWithOutOfOrderDelete() throws Exception { | |||
orgReplica.applyIndexOperationOnReplica(3, 1, VersionType.EXTERNAL, IndexRequest.UNSET_AUTO_GENERATED_TIMESTAMP, false, | |||
SourceToParse.source(orgReplica.shardId().getIndexName(), "type", "id2", new BytesArray("{}"), XContentType.JSON), u -> {}); | |||
|
|||
final int translogOps; | |||
final int translogOps = 4; // 3 ops + seqno gaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the tests for the cases where we've updated the index settings. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working to have another test for this as the assumption in this test is no longer correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which other test is that?
/** | ||
* An {@link IndexDeletionPolicy} that deletes unneeded index commits, and returns index commits are not deleted by this policy. | ||
*/ | ||
public interface ESIndexDeletionPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a separate interface? If we want to know which commits can be kept,
we can just do commits.stream().allMatch(c -> c.isDeleted() == false)
after invoking the onInit
or onCommit
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface is to avoid keeping translog of snapshotted index commits. SnapshotDeletionPolicy
suppresses the delete()
for snapshotted commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have documented its purpose.
@dnhatn I have a question before diving deeper - I expected to find |
I like the current design, three policies are decoupled; I tried to keep them separated as before. I will embed |
@bleskes, I have merged the |
@DaveCTurner I've addressed your comments but not all as I've removed the new interface. |
I also find the composition of deletion policies elegant. Sadly elegance is an all or nothing thing. Adding extra wraps or copy interface means you lose it...
Thanks! I will look tomorrow morning. |
Yannick said:
The interface has gone (👍) and the second half of this idea seems like a good one. The meaning of the return value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question about a potential simpler approach. Also, I see you added some clean up w.r.t to duplicate commits above the global checkpoint with the same max seq no - can you elaborate why this is needed? Can't we just leave them alone?
} | ||
|
||
@Override | ||
public void onInit(List<? extends IndexCommit> commits) throws IOException { | ||
indexDeletionPolicy.onInit(commits); | ||
final List<IndexCommit> keptCommits = deleteOldIndexCommits(commits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rely on indexOfKeptCommits to give use the index of the first commit to be kept and use that to:
- Delete all commits before
- Set the translog deletion policy generation based on the commit the index points to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree. Let's remove the duplicate logic first.
@bleskes, There are some cases we may end up with multiple index commits having max_seqno.
elasticsearch/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java Lines 448 to 468 in 2900e3f
|
Thanks. Let's keep this a simple as possible. If those duplicate commit become a problem we can fix it later. Note though that when those duplicate commits are the "good ones", we should keep just one of them by choosing the commit appropriately. This is simple as it's just a question of select the commit index correctly. |
@elasticmachine please test this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @dnhatn !
assert engineConfig.getForceNewHistoryUUID() == false | ||
|| openMode == EngineConfig.OpenMode.CREATE_INDEX_AND_TRANSLOG | ||
|| openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG | ||
: "OpenMode must be either CREATE_INDEX_AND_TRANSLOG or OPEN_INDEX_CREATE_TRANSLOG if forceNewHistoryUUID is true"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add the current open mode to the message please
@DaveCTurner Could you please have a look? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've lost the thread on this PR, but I requested changes on an earlier version that are either done or superseded.
Thanks @bleskes and @DaveCTurner. |
We need to keep index commits and translog operations up to the current global checkpoint to allow us to throw away unsafe operations and increase the operation-based recovery chance. This is achieved by a new index deletion policy. Relates #10708
The test testWithRandomException was not updated accordingly to the latest translog policy. Method setTranslogGenerationOfLastCommit should be called before whenever setMinTranslogGenerationForRecovery is called. Relates #27606
The test testWithRandomException was not updated accordingly to the latest translog policy. Method setTranslogGenerationOfLastCommit should be called before whenever setMinTranslogGenerationForRecovery is called. Relates #27606
Today we use the in-memory global checkpoint from SequenceNumbersService to clean up unneeded commit points, however the latest global checkpoint may haven't fsynced to the disk yet. If the translog checkpoint fsync failed and we already use a higher global checkpoint to clean up commit points, then we may have removed a safe commit which we try to keep for recovery. This commit updates the deletion policy using lastSyncedGlobalCheckpoint from Translog rather the in memory global checkpoint. Relates #27606
Today we use the in-memory global checkpoint from SequenceNumbersService to clean up unneeded commit points, however the latest global checkpoint may haven't fsynced to the disk yet. If the translog checkpoint fsync failed and we already use a higher global checkpoint to clean up commit points, then we may have removed a safe commit which we try to keep for recovery. This commit updates the deletion policy using lastSyncedGlobalCheckpoint from Translog rather the in memory global checkpoint. Relates elastic#27606
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates elastic#27606 Relates elastic#28038
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates #27606 Relates #28038
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates #27606 Relates #28038
I have to include both index commit and translog policies in a single PR because they need one another for testing. This PR is a rework of #27367.
We need to keep index commits and translog operations up to the current global checkpoint to allow us to throw away unsafe operations and increase the operation-based recovery chance. This is achieved by a new index deletion policy.