Unattended update enablement should be an explicit choice that users make #4594

dumbnose · 2021-07-21T18:38:54Z

Describe the request

Unattended update enablement should be an explicit choice that users make, rather than defaulting to being enabled.

Explain why AKS Engine needs it

Unattended updates can cause widespread outages for services, as we have seen for the Ubuntu update on 7/21/2021. Many users of AKS Engine had major outages due to this upgrade.

Using unattended updates requires work from the team managing the cluster to ensure that upcoming updates won't cause an outage. Not using unattended updates requires the team to push updates to stay secure. So, neither option is a reasonable default.

Describe the solution you'd like

Enablement of unattended updates should not have a default and should require the user to make an explicit choice. The docs should also be updated explaining the risks and work of either choice.

Describe alternatives you've considered

AKS Engine could add a feature to orchestrate the updates in a staged manner and stop if they updated nodes are unhealthy. This is much more work, although it would add more value. Since hosting your own cluster is a large undertaking, it seems reasonable to put the onus on the user to make a choice and do the work to ensure they don't have a major outage.

Additional context

Several service team within Microsoft just accepted the default and didn't realize the implications. They didn't do the work necessary to ensure they wouldn't have an outage and their customers were impacted. This shows that the tradeoffs are not obvious, even to professional service organizations.

Hosting your own clusters is a large undertaking. Defaults like this can hide the true costs of running a production cluster and put the user in a dangerous situation. Users should need to make the choice and the docs should explain the implications of either option.

welcome · 2021-07-21T18:38:56Z

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

devigned · 2021-07-21T18:46:54Z

@dumbnose, what do you think about advising folks to use something like https://github.com/jackfrancis/kamino to do the progressive rollout?

Agreed about defaulting to no unattended updates.

@mboersma and @jackfrancis please give this your attention.

devigned · 2021-07-21T18:50:36Z

In Cluster API, unattended upgrades are off by default in Debian based images: https://github.com/kubernetes-sigs/image-builder/blob/cbac385d016ccfbc9d5b8a0695b6ffb0260413b2/images/capi/ansible/roles/sysprep/tasks/debian.yml#L82.

devigned · 2021-07-21T21:01:32Z

Related, but only on initial boot: #4231

ascendantlogic · 2021-08-20T15:09:23Z

Unattended update enablement should be an explicit choice that users make, rather than defaulting to being enabled.

It sounds to me like currently users are getting an automated level of protection and this request is specifically to change that to an opt-in model that some customers, presumably unaware of the current protection afforded by UU, will then not make and end up being left in a less secure position then they were in previously.

Unattended updates can cause widespread outages for services, as we have seen for the Ubuntu update on 7/21/2021. Many users of AKS Engine had major outages due to this upgrade.

Using unattended updates requires work from the team managing the cluster to ensure that upcoming updates won't cause an outage.

I'd be interested to see numbers on how many security patches are applied by UU, on average, between major failure/outage events. I'd wager it'd be in the tens if not hundreds of thousands of patches to each downtime event. I'd also be interested to see the incidence of breaches in systems without UU enabled vs the number of breaches in systems with it enabled. Using those two metrics you could then make a more informed choice on the risks of downtime from leaving UU enabled vs the risk of breach from leaving it disabled.

Having been the sole ops person at multiple companies I was always willing to sacrifice uptime in exchange for improved security every single time because explaining to customers you went down due to proactively applying security updates is a much different conversation then explaining to customers that their data was exfiltrated due to a breach because security was sacrificed at the altar of uptime.

There has to be a better way to mitigate downtime without simply disabling the automated application of security patches and forcing customers to "opt-in" to a level of security they're getting by default currently.

stale · 2022-04-17T04:28:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bridgetkromhout · 2022-09-13T22:24:39Z

Thanks for the feedback. As the AKS Engine project is deprecated, we're not going to make this change; I suggest looking at Kubernetes Cluster API Provider Azure as a replacement for self-managed clusters (or switching to AKS if that's an option).

dumbnose added the enhancement New feature or request label Jul 21, 2021

mboersma mentioned this issue Jul 22, 2021

Clusters in bridge mode may fail from unattended upgrades #4595

Closed

stale bot added the stale label Apr 17, 2022

bridgetkromhout closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unattended update enablement should be an explicit choice that users make #4594

Unattended update enablement should be an explicit choice that users make #4594

dumbnose commented Jul 21, 2021

welcome bot commented Jul 21, 2021

devigned commented Jul 21, 2021

devigned commented Jul 21, 2021 •

edited

Loading

devigned commented Jul 21, 2021 •

edited

Loading

ascendantlogic commented Aug 20, 2021

stale bot commented Apr 17, 2022

bridgetkromhout commented Sep 13, 2022

Unattended update enablement should be an explicit choice that users make #4594

Unattended update enablement should be an explicit choice that users make #4594

Comments

dumbnose commented Jul 21, 2021

welcome bot commented Jul 21, 2021

devigned commented Jul 21, 2021

devigned commented Jul 21, 2021 • edited Loading

devigned commented Jul 21, 2021 • edited Loading

ascendantlogic commented Aug 20, 2021

stale bot commented Apr 17, 2022

bridgetkromhout commented Sep 13, 2022

devigned commented Jul 21, 2021 •

edited

Loading

devigned commented Jul 21, 2021 •

edited

Loading