-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Upon being elected as master, prefer joins' node info to existing cluster state #19743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * @param nodeId id of the wanted node | ||
| * @return wanted node if it exists. Otherwise <code>null</code> | ||
| */ | ||
| public DiscoveryNode get(String nodeId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add @nullable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
@bleskes I left some comments but the overall change looks good. While I think that separating out |
| * unassigned an shards that are associated with nodes that are no longer part of the cluster, potentially promoting replicas | ||
| * if needed. | ||
| */ | ||
| public RoutingAllocation.Result deassociateDeadNodes(ClusterState clusterState, boolean reroute, String reason) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding an overloaded version of the method where reroute is true (like we have for startedShards / failedShards)?
alternatively, we could make it even more explicit when reroute is not called, i.e. have deassociateDeadNodes always do the reroute and add a method deassociateDeadNodesWithoutReroute for the rare cases where we don't reroute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tja. can do if you feel strongly about it. To me it feels a bit like an overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's leave it as is.
|
@ywelsch thanks. I pushed an updated. |
|
LGTM. Thanks @bleskes! |
Slims the public interface of RoutingNodes down to 4 methods to update routing entries: - initializeShard() -> initializes an unassigned shard - startShard() -> starts an initializing shard / completes relocation of a shard - relocateShard() -> starts relocation of a started shard - failShard() -> fails/cancels an assigned shard In the spirit of PR #19743, where deassociateDeadNodes was moved to its own public method to be only called when nodes have actually left the cluster and not on every reroute step, this commit also removes electPrimariesAndUnassignedDanglingReplicas from AllocationService and folds it into the shard failure logic. This means that an active replica is promoted to primary in the same method where the primary was failed. Previously we would scan in each reroute iteration for active replicas to be promoted to primary.
When we introduces persistent node ids we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying).
Sadly there were two problems with this:
AllocationService, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean.This PR fixes these two and prefers incoming nodes to existing node when finishing an election.
On top of the, on request by @ywelsch ,
AllocationServicesynchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time.