K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

adejanovski · 2022-04-26T10:19:57Z

My cluster ended up in a state where one Cassandra pod wouldn't start because the data volume was corrupted (or something like that). In this case, the management API tries over and over again to restart Cassandra and the pod gets stuck with the "Starting" cass-operator label.
One "logical" thing to do then is to use the replaceNodes setting of cass-operator to replace the faulty pod with a new one (including a new PV), and bootstrap it safely by replacing the previous instance of that node. Sadly, cass-operator prevents that from happening and the node never gets replaced.

The manual fix wasn't really easy, and involved:

removing the node from the cluster through nodetool removenode
deleting the PV and PVC
then deleting the pod

The additional streaming session and token movements triggered by the node removal phase could really be avoided, as well as the follow up cleanup operation if we could make it so that cass-operator allows such replacements.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1475
┆priority: Medium

The text was updated successfully, but these errors were encountered:

burmanm · 2022-07-29T10:46:02Z

PR 383 should take care of this, as it does not care about the state of the node that's being replaced.

sync-by-unito bot changed the title ~~Handle replacement of nodes stuck in "Starting" mode~~ K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode Apr 26, 2022

burmanm mentioned this issue Jul 29, 2022

CassandraTask updates #383

Merged

5 tasks

adejanovski added the zh:Ready-For-Review label Jul 29, 2022

burmanm closed this as completed in #383 Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

adejanovski commented Apr 26, 2022 •

edited by sync-by-unito bot

Loading

burmanm commented Jul 29, 2022

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

K8SSAND-1475 ⁃ Handle replacement of nodes stuck in "Starting" mode #327

Comments

adejanovski commented Apr 26, 2022 • edited by sync-by-unito bot Loading

burmanm commented Jul 29, 2022

adejanovski commented Apr 26, 2022 •

edited by sync-by-unito bot

Loading