Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Aug 2, 2016

When we introduces persistent node ids we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying).

Sadly there were two problems with this:

  1. One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in AllocationService, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean.
  2. The assumption that Node Fault Detection will clean up is wrong when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active.

This PR fixes these two and prefers incoming nodes to existing node when finishing an election.
On top of the, on request by @ywelsch , AllocationService synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time.

@bleskes bleskes added >bug :Allocation :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v5.0.0-beta1 labels Aug 2, 2016
* @param nodeId id of the wanted node
* @return wanted node if it exists. Otherwise <code>null</code>
*/
public DiscoveryNode get(String nodeId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add @nullable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@ywelsch
Copy link
Contributor

ywelsch commented Aug 3, 2016

@bleskes I left some comments but the overall change looks good. While I think that separating out deassociateDeadNodes from reroute is great, I feel less enthusiastic about how we handle electPrimariesAndUnassignedDanglingReplicas. That method goes conceptually together with cancelShard (the method that moves shards to unassigned). I wonder if we can marry them together. How about leaving electPrimariesAndUnassignedDanglingReplicas in reroute for now and tackle that in a future PR?

* unassigned an shards that are associated with nodes that are no longer part of the cluster, potentially promoting replicas
* if needed.
*/
public RoutingAllocation.Result deassociateDeadNodes(ClusterState clusterState, boolean reroute, String reason) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding an overloaded version of the method where reroute is true (like we have for startedShards / failedShards)?

alternatively, we could make it even more explicit when reroute is not called, i.e. have deassociateDeadNodes always do the reroute and add a method deassociateDeadNodesWithoutReroute for the rare cases where we don't reroute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tja. can do if you feel strongly about it. To me it feels a bit like an overkill

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's leave it as is.

@bleskes
Copy link
Contributor Author

bleskes commented Aug 4, 2016

@ywelsch thanks. I pushed an updated.

@ywelsch
Copy link
Contributor

ywelsch commented Aug 4, 2016

LGTM. Thanks @bleskes!

@bleskes bleskes merged commit 609a199 into elastic:master Aug 5, 2016
@bleskes bleskes deleted the not_master_node_collision branch August 5, 2016 06:58
ywelsch added a commit that referenced this pull request Aug 9, 2016
Slims the public interface of RoutingNodes down to 4 methods to update routing entries:
- initializeShard() -> initializes an unassigned shard
- startShard() -> starts an initializing shard / completes relocation of a shard
- relocateShard() -> starts relocation of a started shard
- failShard() -> fails/cancels an assigned shard

In the spirit of PR #19743, where deassociateDeadNodes was moved to its own public method to be only called when nodes have actually left the cluster and not on every reroute step, this commit also removes electPrimariesAndUnassignedDanglingReplicas from AllocationService and folds it into the shard failure logic. This means that an active replica is promoted to primary in the same method where the primary was failed. Previously we would scan in each reroute iteration for active replicas to be promoted to primary.
@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v5.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants