Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

Closed
adejanovski opened this issue Apr 26, 2022 · 1 comment · Fixed by #383
Closed

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

adejanovski opened this issue Apr 26, 2022 · 1 comment · Fixed by #383

Comments

@adejanovski
Copy link
Contributor

adejanovski commented Apr 26, 2022

My cluster ended up in a state where one Cassandra pod wouldn't start because the data volume was corrupted (or something like that). In this case, the management API tries over and over again to restart Cassandra and the pod gets stuck with the "Starting" cass-operator label.
One "logical" thing to do then is to use the replaceNodes setting of cass-operator to replace the faulty pod with a new one (including a new PV), and bootstrap it safely by replacing the previous instance of that node. Sadly, cass-operator prevents that from happening and the node never gets replaced.

The manual fix wasn't really easy, and involved:

  • removing the node from the cluster through nodetool removenode
  • deleting the PV and PVC
  • then deleting the pod

The additional streaming session and token movements triggered by the node removal phase could really be avoided, as well as the follow up cleanup operation if we could make it so that cass-operator allows such replacements.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1475
┆priority: Medium

@sync-by-unito sync-by-unito bot changed the title Handle replacement of nodes stuck in "Starting" mode K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode Apr 26, 2022
@burmanm burmanm mentioned this issue Jul 29, 2022
5 tasks
@burmanm
Copy link
Contributor

burmanm commented Jul 29, 2022

PR 383 should take care of this, as it does not care about the state of the node that's being replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants