Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot force allocate primary to a node where the shard already exists #22031

Merged
merged 3 commits into from
Dec 8, 2016

Conversation

abeyad
Copy link

@abeyad abeyad commented Dec 7, 2016

Before, it was possible that the SameShardAllocationDecider would allow
force allocation of an unassigned primary to the same node on which an
active replica is assigned. This could only happen with shadow replica
indices, because when a shadow replica primary fails, the replica gets
promoted to primary but in the INITIALIZED state, not in the STARTED
state (because the engine has specific reinitialization that must take
place in the case of shadow replicas). Therefore, if the now promoted
primary that is initializing fails also, the primary will be in the
unassigned state, because replica to primary promotion only happens when
the failed shard was in the started state. The now unassigned primary
shard will go through the allocation deciders, where the
SameShardsAllocationDecider would return a NO decision, but would still
permit force allocation on the primary if all deciders returned NO.

This commit implements canForceAllocatePrimary on the
SameShardAllocationDecider, which ensures that a primary cannot be
force allocated to the same node on which an active replica already
exists.

Relates #22021

Before, it was possible that the SameShardAllocationDecider would allow
force allocation of an unassigned primary to the same node on which an
active replica is assigned.  This could only happen with shadow replica
indices, because when a shadow replica primary fails, the replica gets
promoted to primary but in the INITIALIZED state, not in the STARTED
state (because the engine has specific reinitialization that must take
place in the case of shadow replicas).  Therefore, if the now promoted
primary that is initializing fails also, the primary will be in the
unassigned state, because replica to primary promotion only happens when
the failed shard was in the started state.  The now unassigned primary
shard will go through the allocation deciders, where the
SameShardsAllocationDecider would return a NO decision, but would still
permit force allocation on the primary if all deciders returned NO.

This commit implements canForceAllocatePrimary on the
SameShardAllocationDecider, which ensures that a primary cannot be
force allocated to the same node on which an active replica already
exists.
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test in SameShardRoutingTests?

@abeyad
Copy link
Author

abeyad commented Dec 7, 2016

@ywelsch I pushed a test in 82c8137

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one suggestion. I think it's also ok if this doesn't go to 5.0.3

);

// can't force allocate same shard copy to the same node
ShardRouting newPrimary = ShardRouting.newUnassigned(primaryShard.shardId(), true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestShardRouting automatically provides some of the randomness that you explicitly do here.

@abeyad abeyad removed the v5.0.3 label Dec 8, 2016
@abeyad
Copy link
Author

abeyad commented Dec 8, 2016

I pushed 9603046 which uses TestShardRouting. Thanks for the review @ywelsch

@abeyad abeyad merged commit 3da0429 into elastic:master Dec 8, 2016
@abeyad abeyad deleted the same_shard_alloc_cant_force branch December 8, 2016 17:21
abeyad pushed a commit that referenced this pull request Dec 8, 2016
#22031)

Before, it was possible that the SameShardAllocationDecider would allow
force allocation of an unassigned primary to the same node on which an
active replica is assigned.  This could only happen with shadow replica
indices, because when a shadow replica primary fails, the replica gets
promoted to primary but in the INITIALIZED state, not in the STARTED
state (because the engine has specific reinitialization that must take
place in the case of shadow replicas).  Therefore, if the now promoted
primary that is initializing fails also, the primary will be in the
unassigned state, because replica to primary promotion only happens when
the failed shard was in the started state.  The now unassigned primary
shard will go through the allocation deciders, where the
SameShardsAllocationDecider would return a NO decision, but would still
permit force allocation on the primary if all deciders returned NO.

This commit implements canForceAllocatePrimary on the
SameShardAllocationDecider, which ensures that a primary cannot be
force allocated to the same node on which an active replica already
exists.
@abeyad
Copy link
Author

abeyad commented Dec 8, 2016

5.x commit: 6d3e6f2

@abeyad abeyad removed the v5.1.2 label Dec 8, 2016
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 8, 2016
* master:
  Skip IP range query REST test prior to 5.1.2
  Bump version to 5.1.2
  Don't allow yaml tests with `warnings` that don't skip `warnings` (elastic#21989)
  Cannot force allocate primary to a node where the shard already exists (elastic#22031)
  Fix REST test for ip range aggregations.
  Build: NORELEASE is the same as norelease (elastic#22006)
  S3/Azure snapshot repo documentation wrong for "read_only"
@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v5.2.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants