-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update-etcd-rewrite #202
update-etcd-rewrite #202
Conversation
k8s does compaction on it's own kubernetes/kubernetes#24079 Why we need additional one? Copy paste here from other ticket: 1 - etcd-io/etcd#8098 WDYT @calvix @teemow ? imo we should be careful with such changes. We saw the problem only once and on very specific environment (that survived multiple updates and had inoptimized cronjobs that produced thousands of resources). And finally we are not sure that that it lack of compaction was a root cause. In my opinion we should add a monitoing on that first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feeling that it's not right way to go. please see details in comment above
compaction + defrag was the thing i needed to do in order to restore I can easily see customer doing same thing on k8s cluster as we did on |
SO this happened again on Vodafone guest cluster, so it's not something random or specific to |
Okay, but i'm still not 100% confident :D TL;DR despite they have autocompaction enabled, they still hit this issue. And it's related to BoltDB bug. Fix will be landed only in 3.3 For me it''s still not clear what was the root cause for both cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok if quorum will decide to merge this. But my opinion is still that autocompaction is not the proper fix.
IMHO, a single appearance of that issue looks like a corner case, which does not bring enough information for applying particular resolving strategy |
It happened on Lycan and Viking (in a guest cluster). I think this can always happen and we need to monitor that at least. For now starting with the host clusters. |
rewrite of #193