-
Notifications
You must be signed in to change notification settings - Fork 522
Unattended update enablement should be an explicit choice that users make #4594
Comments
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it. |
@dumbnose, what do you think about advising folks to use something like https://github.com/jackfrancis/kamino to do the progressive rollout? Agreed about defaulting to no unattended updates. @mboersma and @jackfrancis please give this your attention. |
In Cluster API, unattended upgrades are off by default in Debian based images: https://github.com/kubernetes-sigs/image-builder/blob/cbac385d016ccfbc9d5b8a0695b6ffb0260413b2/images/capi/ansible/roles/sysprep/tasks/debian.yml#L82. |
Related, but only on initial boot: #4231 |
It sounds to me like currently users are getting an automated level of protection and this request is specifically to change that to an opt-in model that some customers, presumably unaware of the current protection afforded by UU, will then not make and end up being left in a less secure position then they were in previously.
I'd be interested to see numbers on how many security patches are applied by UU, on average, between major failure/outage events. I'd wager it'd be in the tens if not hundreds of thousands of patches to each downtime event. I'd also be interested to see the incidence of breaches in systems without UU enabled vs the number of breaches in systems with it enabled. Using those two metrics you could then make a more informed choice on the risks of downtime from leaving UU enabled vs the risk of breach from leaving it disabled. Having been the sole ops person at multiple companies I was always willing to sacrifice uptime in exchange for improved security every single time because explaining to customers you went down due to proactively applying security updates is a much different conversation then explaining to customers that their data was exfiltrated due to a breach because security was sacrificed at the altar of uptime. There has to be a better way to mitigate downtime without simply disabling the automated application of security patches and forcing customers to "opt-in" to a level of security they're getting by default currently. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Thanks for the feedback. As the AKS Engine project is deprecated, we're not going to make this change; I suggest looking at Kubernetes Cluster API Provider Azure as a replacement for self-managed clusters (or switching to AKS if that's an option). |
Describe the request
Unattended update enablement should be an explicit choice that users make, rather than defaulting to being enabled.
Explain why AKS Engine needs it
Unattended updates can cause widespread outages for services, as we have seen for the Ubuntu update on 7/21/2021. Many users of AKS Engine had major outages due to this upgrade.
Using unattended updates requires work from the team managing the cluster to ensure that upcoming updates won't cause an outage. Not using unattended updates requires the team to push updates to stay secure. So, neither option is a reasonable default.
Describe the solution you'd like
Enablement of unattended updates should not have a default and should require the user to make an explicit choice. The docs should also be updated explaining the risks and work of either choice.
Describe alternatives you've considered
AKS Engine could add a feature to orchestrate the updates in a staged manner and stop if they updated nodes are unhealthy. This is much more work, although it would add more value. Since hosting your own cluster is a large undertaking, it seems reasonable to put the onus on the user to make a choice and do the work to ensure they don't have a major outage.
Additional context
Several service team within Microsoft just accepted the default and didn't realize the implications. They didn't do the work necessary to ensure they wouldn't have an outage and their customers were impacted. This shows that the tradeoffs are not obvious, even to professional service organizations.
Hosting your own clusters is a large undertaking. Defaults like this can hide the true costs of running a production cluster and put the user in a dangerous situation. Users should need to make the choice and the docs should explain the implications of either option.
The text was updated successfully, but these errors were encountered: