Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to push assignment updates to nodes that were removed from the list #419

Merged
merged 2 commits into from
Dec 29, 2023

Conversation

merlimat
Copy link
Collaborator

When we are shrinking the size of a cluster and removing nodes from the coordinator config files, we are stopping the NodeController so that we don't react on errors coming from a failed node.

The problem is that some of these nodes might still be online and used by the clients. For example, they might still be marked as "ready" by K8S and still serving the assignments dispatch to clients.

If the coordinator node controller stops, the node will not receive any update on new leader elections, and if a client is connected to an old (removed) node, it will still operate based on the old leader assignment.

Modifications

When a node is removed, we leave the NodeController running, though we change the state to Draining. When this node stops responding to health-checks, the node controller will then avoid retrying again and it will just finally cleanup the removed node completely.

This will make sure that nodes removed from coordinator are still up to date with the current assignments, until the moment where they are finally shut down.

@merlimat merlimat merged commit 53c3f4a into streamnative:main Dec 29, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant