You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an unexpected failure during node replacement in my Kubernetes cluster, leading to a critical issue with the MySQL StatefulSet. The failure resulted in the loss of the disk for the MySQL replica with index 0, causing the replica to be unable to start. While the other two replicas had up-to-date data, they couldn't initiate due to the StatefulSet's hanging startup process for the first replica, which experienced data loss.
To address such issues, I propose leveraging the new Kubernetes v1.24 feature - .spec.updateStrategy.rollingUpdate.maxUnavailable. You can set it equal to the number of replicas in the StatefulSet, for instance, with three replicas and maxUnavailable = 3. This way, the remaining replicas with valid data might be able to launch successfully.
The current situation left me with no apparent method to utilize the data from the other replicas to recover from the failure. Consequently, I had to resort to restoring from a backup, causing additional downtime and administrative efforts.
I believe adopting the suggested feature could significantly enhance the reliability and fault-tolerance of StatefulSets in similar scenarios, preventing potential data loss and cluster failures.
Feature State: Kubernetes v1.24 [alpha]
Thank you for considering this proposal.
Best regards
The text was updated successfully, but these errors were encountered:
I encountered an unexpected failure during node replacement in my Kubernetes cluster, leading to a critical issue with the MySQL StatefulSet. The failure resulted in the loss of the disk for the MySQL replica with index 0, causing the replica to be unable to start. While the other two replicas had up-to-date data, they couldn't initiate due to the StatefulSet's hanging startup process for the first replica, which experienced data loss.
To address such issues, I propose leveraging the new Kubernetes v1.24 feature - .spec.updateStrategy.rollingUpdate.maxUnavailable. You can set it equal to the number of replicas in the StatefulSet, for instance, with three replicas and maxUnavailable = 3. This way, the remaining replicas with valid data might be able to launch successfully.
The current situation left me with no apparent method to utilize the data from the other replicas to recover from the failure. Consequently, I had to resort to restoring from a backup, causing additional downtime and administrative efforts.
I believe adopting the suggested feature could significantly enhance the reliability and fault-tolerance of StatefulSets in similar scenarios, preventing potential data loss and cluster failures.
Feature State: Kubernetes v1.24 [alpha]
Thank you for considering this proposal.
Best regards
The text was updated successfully, but these errors were encountered: