Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Overly Optimistic Request Deduplication (#51270) #51291

Merged
merged 1 commit into from
Jan 22, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Fix Overly Optimistic Request Deduplication (#51270)
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes #51253
original-brownbear committed Jan 22, 2020
commit 54cef09b09c7a70c68c2f7e977b4468effd70dd7
Original file line number Diff line number Diff line change
@@ -357,6 +357,9 @@ private void syncShardStatsOnNewMaster(ClusterChangedEvent event) {
return;
}

// Clear request deduplicator since we need to send all requests that were potentially not handled by the previous
// master again
remoteFailedRequestDeduplicator.clear();
for (SnapshotsInProgress.Entry snapshot : snapshotsInProgress.entries()) {
if (snapshot.state() == State.STARTED || snapshot.state() == State.ABORTED) {
Map<ShardId, IndexShardSnapshotStatus> localShards = currentSnapshotShards(snapshot.snapshot());
Original file line number Diff line number Diff line change
@@ -53,6 +53,14 @@ public void executeOnce(T request, ActionListener<Void> listener, BiConsumer<T,
}
}

/**
* Remove all tracked requests from this instance so that the first time {@link #executeOnce} is invoked with any request it triggers
* an actual request execution. Use this e.g. for requests to master that need to be sent again on master failover.
*/
public void clear() {
requests.clear();
}

public int size() {
return requests.size();
}