-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd defrag + backup: Avoid too many leader changes #384
Comments
To point 1: It should be noted that the backup and fstrim operations can not be done centrally but would need a local job on the non-leader control-plane nodes, possibly triggered via ssh. |
Point 2 is easy to do. |
Points 3+4: If we decide to simply skip defrag on the etcd leader (infinitely), we'd cover both of these points with a single change. |
Your feedback is welcome, especially @matofeder, @chess-knight, @batistein, @janiskemper, @ajfriesen, @flyersa, @curx, @fynluk, @mxmxchere. |
@guettli this is what we talked about. maybe we could contribute here |
I think the best solution would be an official solution from etcd.io. Maybe it is enough to update the docs: I created an issue for that: etcd-io/etcd#15477 Maybe it is enough to skip defragmentation if the current instance is the leader. After N hours the cron-job would try it again. AFAIK defragmentation is not that pressing, so that this simple solution might be ok. There is a K8s cronjob etcd-defrag-cronjob, but this depends on Prometheus. I personally would prefer a solution which does not depend on a third party tool like prometheus. |
I agree. But the script from the etcd-defrag-cronjob project could be adopted as a solid base for our implementation. I would propose the following (with the respect to points 1..3 in the description): Write a short script that will be deployed to each control plane node and executed periodically (at the same time), e.g. the etcd-defrag.service could be adjusted here and instead of
Next periodic execution does the same.
I would say YES as this may cause some unwanted issues on the client side. Optionally, a "special" flag could be introduced to allow this, e.g. |
I like the described approach by @matofeder. I have not yet looked at the implementation of defrag.sh and how it compares to what we are currently doing. In addition to defragmenting, we do create a local backup (for manual disaster recovery) and do a discard (fstrim) on the filesystem to combat FTL fragmentation. I think both are useful. Both are also fine to only happen on the previous leader as we will change the leader on a daily basis at least (in absence of other leader-changing events). We would need to ensure that the leader changes don't ping-pong between just two etcd nodes but are either arbitrary or round-robin. |
PS: We could just document that single control-plane nodes have the risk of slowly degrading due to fragmentation and that they are not meant for long-term operation. Work-arounds would be manual intervention (not recommended) or periodic temporary upgrades to a three control-plane-node scenario over night. |
This is the quick-fix: We just don't do defragmentation on the etcd leader. This avoids etcd service interruption on single-node etcd clusters and spurious leader changes on multi-node ones. Note that this is the intermediate step until we have a more complete solution as depicted in #384 (comment) Signed-off-by: Kurt Garloff <kurt@garloff.de>
As we have the etcd fragmentation unconditionally enabled since merging #355 a few days ago, I want to ensure we don't cause trouble for users of the main branch and introduce a quick fix. |
I agree
Let's merge #387 as a hotfix.
I will take care of that. |
This is the quick-fix: We just don't do defragmentation on the etcd leader. This avoids etcd service interruption on single-node etcd clusters and spurious leader changes on multi-node ones. Note that this is the intermediate step until we have a more complete solution as depicted in #384 (comment) Signed-off-by: Kurt Garloff <kurt@garloff.de>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
BTW, Gardener has related repository, and they plan this feature: Defragmentor of backup-restore should also consider the etcd db size along with scheduled defrag |
* Update etcd defrag and backup This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 * Fix etcd defrag script Cloud-init doesn't like '{#' - jinja template comment Fix also installation of etcdctl tool * Add check to avoid defragmentation on unhealthy etcd cluster * Add `force-*` optional arguments to the etcd defragmentation script This commit adds optional arguments to the `etcd-defrag.sh` script that allow skipping script checks. Optional arguments are: - `--force-single` (allows to execute defragmentation on single member etcd cluster) - `--force-unhealthy` (allows to execute defragmentation on unhealthy etcd member) - `--force-nonleader` (allows to execute defragmentation on non leader etcd member) * Add etcd maintenance section into Maintenance_and_Troubleshooting.md This commit adds etcd maintenance section into the Maintenance_and_Troubleshooting docs. Section, for now, describes the etcd defragmentation and backup script `etcd-defrag.sh`. * fixup! Add etcd maintenance section into Maintenance_and_Troubleshooting.md Signed-off-by: Matej Feder <matej.feder@dnation.cloud> Signed-off-by: Roman Hros <roman.hros@dnation.cloud> Signed-off-by: Kurt Garloff <kurt@garloff.de> Co-authored-by: Roman Hros <roman.hros@dnation.cloud>
* Update etcd defrag and backup This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 * Fix etcd defrag script Cloud-init doesn't like '{#' - jinja template comment Fix also installation of etcdctl tool * Add check to avoid defragmentation on unhealthy etcd cluster * Add `force-*` optional arguments to the etcd defragmentation script This commit adds optional arguments to the `etcd-defrag.sh` script that allow skipping script checks. Optional arguments are: - `--force-single` (allows to execute defragmentation on single member etcd cluster) - `--force-unhealthy` (allows to execute defragmentation on unhealthy etcd member) - `--force-nonleader` (allows to execute defragmentation on non leader etcd member) * Add etcd maintenance section into Maintenance_and_Troubleshooting.md This commit adds etcd maintenance section into the Maintenance_and_Troubleshooting docs. Section, for now, describes the etcd defragmentation and backup script `etcd-defrag.sh`. * fixup! Add etcd maintenance section into Maintenance_and_Troubleshooting.md Signed-off-by: Matej Feder <matej.feder@dnation.cloud> Signed-off-by: Roman Hros <roman.hros@dnation.cloud> Signed-off-by: Kurt Garloff <kurt@garloff.de> Co-authored-by: Roman Hros <roman.hros@dnation.cloud>
* Update etcd defrag and backup This commit updates etcd defrag and backup script as follows: - Script exits without any defrag/backup/trim action if: - It is executed on non leader etcd member - It is executed on single member etcd cluster - Script defragment the etcd cluster as follows: - Defrag the non leader etcd members first - Change the leadership to the randomly selected and defragmentation completed etcd member - Defrag the local (ex-leader) etcd member - Script then backup & trim local (ex-leader) etcd member This script executes etcdctl commands like `etcdctl move-leader` or `etcdctl endpoint status --cluster` which were introduced in etcdctl version 3.3.0. The previous etcdctl client was installed as an `apt` package. The latest etcdctl version available in Ubuntu 20.04 repositories is v3.2.26, hence this commit also introduces `etcdctl_version` variable that contains the desired version of etcdctl client. Etcdctl client is then used for etcd DB maintenance tasks. Issue #384 * Fix etcd defrag script Cloud-init doesn't like '{#' - jinja template comment Fix also installation of etcdctl tool * Add check to avoid defragmentation on unhealthy etcd cluster * Add `force-*` optional arguments to the etcd defragmentation script This commit adds optional arguments to the `etcd-defrag.sh` script that allow skipping script checks. Optional arguments are: - `--force-single` (allows to execute defragmentation on single member etcd cluster) - `--force-unhealthy` (allows to execute defragmentation on unhealthy etcd member) - `--force-nonleader` (allows to execute defragmentation on non leader etcd member) * Add etcd maintenance section into Maintenance_and_Troubleshooting.md This commit adds etcd maintenance section into the Maintenance_and_Troubleshooting docs. Section, for now, describes the etcd defragmentation and backup script `etcd-defrag.sh`. * fixup! Add etcd maintenance section into Maintenance_and_Troubleshooting.md Signed-off-by: Matej Feder <matej.feder@dnation.cloud> Signed-off-by: Roman Hros <roman.hros@dnation.cloud> Signed-off-by: Kurt Garloff <kurt@garloff.de> Co-authored-by: Roman Hros <roman.hros@dnation.cloud>
As k8s cluster user, I want the k8s control plane to always be responsive, stable and safe.
We have a nightly job to defragment etcd and back it up on all control plane nodes, randomized a bit, so the defragmentation does not happen all at the same time.
This has been in existence for many months already, but due to a missing
--now
insystemctl enable
, it has not really been active before.As @matofeder points out, the defragmentation may block access to etcd for a while (seconds on typically sized etcd DBs), causing etcd leader changes (on multi-node etcd clusters) or temporary kube-api failures (on single-node etcd clusters).
Things to consider:
The text was updated successfully, but these errors were encountered: