randomize locksmithd reboot window #2610

dibi-codes · 2019-08-30T14:02:31Z

Feature Request

I'm looking for a way to tell locksmithd to randomly execute the "reboot_strategy". For example I have a set of nodes in my environment which have the "reboot"-strategy with a reboot window configured via cloud-init. Now in some cases it happens that those nodes reboot at exactly the same time and my applications are down. What I'm looking for is a way to say something like: This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.
Is this already possible or do you have any recommendations on this?

I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Environment

OpenStack

lucab · 2019-08-30T15:03:38Z

@dabeck thanks for the interesting feedback!

This is the reboot window but you should reboot at a random time in this window so the possibility of a concurrently reboot is minimal.

This same discussion recently came up in an offline architectural conversation around Zincati, and we reached the conclusion that we don't plan to implement this.
The rationale is that it would try to tackle an hybrid case between "reboots are independent" and "reboots are not independent (cluster-wise)". That comes with its own development, testing and maintenance costs. However, the main point is that node-rebooting "is-independent" property is a boolean, so the hybrid case should be properly folded into one of the two options.

Porting this to your specific case: your reboots are indeed NOT independent, and they need to be coordinated somehow based on information which is known by the cluster, but not by every single node.

Is this already possible or do you have any recommendations on this?
I know about the etcd-lock option but my nodes don't have etcd setup, so etcd-lock is not an option for me.

Recommendation would be to acknowledge that your reboots need to be orchestrated somehow, cluster-wide.
Locksmith only supports etcd2 for that, so you need to either provide an etcd2 cluster or come up with a similar solution.
You don't need to have etcd running on each node, as locksmith should allow you to specify which endpoint to use for etcd (could be also somewhere remote).
Alternatively if you are using kubernetes you may have a look at https://github.com/coreos/container-linux-update-operator.

lucab added area/usability component/locksmith team/os labels Aug 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

randomize locksmithd reboot window #2610

randomize locksmithd reboot window #2610

dibi-codes commented Aug 30, 2019

lucab commented Aug 30, 2019

randomize locksmithd reboot window #2610

randomize locksmithd reboot window #2610

Comments

dibi-codes commented Aug 30, 2019

Feature Request

Environment

lucab commented Aug 30, 2019