Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loss and Cluster Failure in Kubernetes StatefulSet Due to Missing Disk for MySQL Replica-0 #898

Open
tebaly opened this issue Jul 19, 2023 · 0 comments

Comments

@tebaly
Copy link

tebaly commented Jul 19, 2023

I encountered an unexpected failure during node replacement in my Kubernetes cluster, leading to a critical issue with the MySQL StatefulSet. The failure resulted in the loss of the disk for the MySQL replica with index 0, causing the replica to be unable to start. While the other two replicas had up-to-date data, they couldn't initiate due to the StatefulSet's hanging startup process for the first replica, which experienced data loss.

To address such issues, I propose leveraging the new Kubernetes v1.24 feature - .spec.updateStrategy.rollingUpdate.maxUnavailable. You can set it equal to the number of replicas in the StatefulSet, for instance, with three replicas and maxUnavailable = 3. This way, the remaining replicas with valid data might be able to launch successfully.

The current situation left me with no apparent method to utilize the data from the other replicas to recover from the failure. Consequently, I had to resort to restoring from a backup, causing additional downtime and administrative efforts.

I believe adopting the suggested feature could significantly enhance the reliability and fault-tolerance of StatefulSets in similar scenarios, preventing potential data loss and cluster failures.

Feature State: Kubernetes v1.24 [alpha]

Thank you for considering this proposal.
Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant