Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis: Data loss of one node will not cause cluster wide issues #4

Closed
ChrisKujawa opened this issue Mar 4, 2020 · 4 comments
Closed
Assignees
Labels
Contribution: Availability This issue will contribute to build up confidence in reliability. Contribution: Reliability This issue will contribute to build up confidence in reliability. Hypothesis A thing which worries us and is ready for exploration. Impact: High The issue has an high impact on the system. Likelihood: Low The issue is really unlikely.

Comments

@ChrisKujawa
Copy link
Member

Hypothesis

We believe that a data loss of one node, will not cause cluster wide issues. For example we could delete a PVC.

@ChrisKujawa ChrisKujawa added Hypothesis A thing which worries us and is ready for exploration. Impact: Low The issue has an low impact on the system. Likelihood: Low The issue is really unlikely. Contribution: Availability This issue will contribute to build up confidence in reliability. Contribution: Reliability This issue will contribute to build up confidence in reliability. Impact: High The issue has an high impact on the system. and removed Impact: Low The issue has an low impact on the system. labels Mar 4, 2020
@ChrisKujawa ChrisKujawa changed the title Hypothesis: Data loss will not cause cluster wide issues Hypothesis: Data loss of one node will not cause cluster wide issues Feb 1, 2022
@deepthidevaki
Copy link
Contributor

With the newly added command zbchaos dataloss it would be possible to verify this scenario. I will add one experiment each for production - S,M,L .

@deepthidevaki deepthidevaki self-assigned this Dec 5, 2022
@ChrisKujawa
Copy link
Member Author

Thanks @deepthidevaki 👍

deepthidevaki added a commit that referenced this issue Dec 7, 2022
)

After a broker recovered from loss of disk, cluster should be able to
survive another broker's disk loss. After a series of loss of disk of
one broker at a time, the cluster should not suffer dataloss. We verify
this by creating instances of the process that is deployed before the
disk loss.

In this we don't have to call `zbchaos dataloss prepare` because there
is no need to add init containers. Since we are only deleting one broker
at a time, the pod can be immediately restarted.

related to #4
@ChrisKujawa
Copy link
Member Author

Closed by #275 🚀

@deepthidevaki
Copy link
Contributor

We only added experiment for Production - S. But I think that is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contribution: Availability This issue will contribute to build up confidence in reliability. Contribution: Reliability This issue will contribute to build up confidence in reliability. Hypothesis A thing which worries us and is ready for exploration. Impact: High The issue has an high impact on the system. Likelihood: Low The issue is really unlikely.
Projects
None yet
Development

No branches or pull requests

2 participants