-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Undo not-preferred in allocation explain #140195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undo not-preferred in allocation explain #140195
Conversation
4947525 to
974cb98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified:
- All the (non-test) code changes are putting the code back the way it was before #137228
Not checked:
- Test code
- That all the changes from #137228 were reverted. I don't understand the changes, so I assume the changes not reverted were omitted deliberately.
Remarks:
- I don't see a new Transport Version yet, nor the corresponding checks.
- We're not changing the
AllocationDecisionIDs back here; I believe that's deliberate.(FWIW, if we're changing them anyway, I'd consider spacing them out (say, counting by fives) so there's space to add new ones inbetween.)
| // rebalance to the node, only will get overwritten if the decision here is to | ||
| // THROTTLE and we get a decision with YES on another node | ||
| bestRebalanceCanAllocateDecisionType = canAllocate.type(); | ||
| if (canAllocate.type().higherThan(Type.NOT_PREFERRED)) { | ||
| // Movement is only allowed to THROTTLE/YES nodes. NOT_PREFERRED is the same as no for rebalancing, since | ||
| // rebalancing aims to distribute resource usage and NOT_PREFERRED means the move could cause hot-spots. | ||
| targetNode = node; | ||
| } | ||
| targetNode = node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| // NOT_PREFERRED, as opposed to NO. | ||
| bestDecision = Type.NOT_PREFERRED; | ||
| // Relocating a shard from one NOT_PREFERRED node to another NOT_PREFERRED node would not improve the situation. | ||
| // Relocating a shard from one NOT_PREFERRED node to another would not improve the situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| } else if (bestDecision == Type.THROTTLE) { | ||
| assert allocation.isSimulating() == false; | ||
| // THROTTLE is better than NOT_PREFERRED, we just need to wait for a YES. | ||
| targetNode = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| // but NOT_PREFERRED and THROTTLED are generally not comparable. | ||
| // NOT_PREFERRED and THROTTLED should not be compared. | ||
| THROTTLE, | ||
| NOT_PREFERRED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these get serialized over Transport? If so, are they sent by name or by ordinal?
I ask because if it's by ordinal, then reordering these would be a breaking change.
(From here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about those specifics, but I do think the original change had a backwards compatibility bug: I should have changed the writeTo / readFrom methods, needing a transport version gate.
| */ | ||
| public boolean assignmentAllowed() { | ||
| return this == NOT_PREFERRED || this == YES; | ||
| return this.compareTo(NOT_PREFERRED) >= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| case YES -> YES; | ||
| case NOT_PREFERRED -> NOT_PREFERRED; | ||
| // TODO: this | ||
| case YES, NOT_PREFERRED -> YES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here.
The TODO comment isn't right, but presumably this is temporary.
| // the final decision is NO (no node to move the shard to) and we are not in explain mode, return a cached version | ||
| return CACHED_CANNOT_MOVE_DECISION; | ||
| } else { | ||
| assert ((targetNode == null) == (moveDecision != AllocationDecision.YES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| : Explanations.Rebalance.CANNOT_REBALANCE_CANNOT_ALLOCATE; | ||
| case THROTTLE -> Explanations.Rebalance.CLUSTER_THROTTLE; | ||
| case YES -> { | ||
| case YES, NOT_PREFERRED -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
| case NO -> canRemainNotPreferred() ? Explanations.Move.NOT_PREFERRED_TO_NO : Explanations.Move.NO; | ||
| case YES, NOT_PREFERRED -> Explanations.Move.YES; | ||
| case THROTTLED -> Explanations.Move.THROTTLED; | ||
| case NO -> Explanations.Move.NO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
We decided to fix the problem instead of rolling it back / undoing it. Closing this PR. |
The work adding not-preferred to the allocation explain output
(#137228) resulted in shards not evacuating a shutting down node
during release testing. There is some unidentified bug in play. Since
a transport version was added with that code change, we wish to
retain the enum changes necessitating the transport version, but
undo / revert all logic changes associated with the commit.
Relates ES-13903