Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cancel allocation when a new sync id is found on shared filesystems #16357

Closed
dakrone opened this issue Feb 1, 2016 · 2 comments
Closed
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement

Comments

@dakrone
Copy link
Member

dakrone commented Feb 1, 2016

In ReplicaShardAllocator.processExistingRecoveries, if we find a "better" match, we cancel allocation of a replica:

// we found a better match that has a full sync id match, the existing allocation is not fully synced
// so we found a better one, cancel this one
it.moveToUnassigned(new UnassignedInfo(UnassignedInfo.Reason.REALLOCATED_REPLICA,
        "existing allocation of replica to [" + currentNode + "] cancelled, sync id match found on node [" + nodeWithHighestMatch + "]"));

However, when on a shared filesystem, all data nodes have the same data, so we should not cancel allocation if a new node pops up.

@dakrone
Copy link
Member Author

dakrone commented Feb 1, 2016

@bleskes I spoke with @brwe about this and I think we agreed it was worth doing, but I'm curious about your input on this as well.

dakrone added a commit to dakrone/elasticsearch that referenced this issue Mar 8, 2016
…found

Currently the message stays in the `UnassignedInfo` for the shard,
however, it would be very useful to know the exact point (time-wise)
that the cancellation happened when diagnosing an issue.

Relates to debugging elastic#16357
dakrone added a commit that referenced this issue Mar 8, 2016
…found

Currently the message stays in the `UnassignedInfo` for the shard,
however, it would be very useful to know the exact point (time-wise)
that the cancellation happened when diagnosing an issue.

Relates to debugging #16357
dakrone added a commit that referenced this issue Mar 8, 2016
…found

Currently the message stays in the `UnassignedInfo` for the shard,
however, it would be very useful to know the exact point (time-wise)
that the cancellation happened when diagnosing an issue.

Relates to debugging #16357
@dakrone
Copy link
Member Author

dakrone commented May 26, 2017

Shadow replicas have been removed and this is no longer applicable

@dakrone dakrone closed this as completed May 26, 2017
@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement
Projects
None yet
Development

No branches or pull requests

3 participants