Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

Unattended update enablement should be an explicit choice that users make #4594

Closed
dumbnose opened this issue Jul 21, 2021 · 7 comments
Closed
Labels
enhancement New feature or request stale

Comments

@dumbnose
Copy link

Describe the request

Unattended update enablement should be an explicit choice that users make, rather than defaulting to being enabled.

Explain why AKS Engine needs it

Unattended updates can cause widespread outages for services, as we have seen for the Ubuntu update on 7/21/2021. Many users of AKS Engine had major outages due to this upgrade.

Using unattended updates requires work from the team managing the cluster to ensure that upcoming updates won't cause an outage. Not using unattended updates requires the team to push updates to stay secure. So, neither option is a reasonable default.

Describe the solution you'd like

Enablement of unattended updates should not have a default and should require the user to make an explicit choice. The docs should also be updated explaining the risks and work of either choice.

Describe alternatives you've considered

AKS Engine could add a feature to orchestrate the updates in a staged manner and stop if they updated nodes are unhealthy. This is much more work, although it would add more value. Since hosting your own cluster is a large undertaking, it seems reasonable to put the onus on the user to make a choice and do the work to ensure they don't have a major outage.

Additional context

Several service team within Microsoft just accepted the default and didn't realize the implications. They didn't do the work necessary to ensure they wouldn't have an outage and their customers were impacted. This shows that the tradeoffs are not obvious, even to professional service organizations.

Hosting your own clusters is a large undertaking. Defaults like this can hide the true costs of running a production cluster and put the user in a dangerous situation. Users should need to make the choice and the docs should explain the implications of either option.

@dumbnose dumbnose added the enhancement New feature or request label Jul 21, 2021
@welcome
Copy link

welcome bot commented Jul 21, 2021

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

@devigned
Copy link
Member

@dumbnose, what do you think about advising folks to use something like https://github.com/jackfrancis/kamino to do the progressive rollout?

Agreed about defaulting to no unattended updates.

@mboersma and @jackfrancis please give this your attention.

@devigned
Copy link
Member

devigned commented Jul 21, 2021

@devigned
Copy link
Member

devigned commented Jul 21, 2021

Related, but only on initial boot: #4231

@ascendantlogic
Copy link

Unattended update enablement should be an explicit choice that users make, rather than defaulting to being enabled.

It sounds to me like currently users are getting an automated level of protection and this request is specifically to change that to an opt-in model that some customers, presumably unaware of the current protection afforded by UU, will then not make and end up being left in a less secure position then they were in previously.

Unattended updates can cause widespread outages for services, as we have seen for the Ubuntu update on 7/21/2021. Many users of AKS Engine had major outages due to this upgrade.

Using unattended updates requires work from the team managing the cluster to ensure that upcoming updates won't cause an outage.

I'd be interested to see numbers on how many security patches are applied by UU, on average, between major failure/outage events. I'd wager it'd be in the tens if not hundreds of thousands of patches to each downtime event. I'd also be interested to see the incidence of breaches in systems without UU enabled vs the number of breaches in systems with it enabled. Using those two metrics you could then make a more informed choice on the risks of downtime from leaving UU enabled vs the risk of breach from leaving it disabled.

Having been the sole ops person at multiple companies I was always willing to sacrifice uptime in exchange for improved security every single time because explaining to customers you went down due to proactively applying security updates is a much different conversation then explaining to customers that their data was exfiltrated due to a breach because security was sacrificed at the altar of uptime.

There has to be a better way to mitigate downtime without simply disabling the automated application of security patches and forcing customers to "opt-in" to a level of security they're getting by default currently.

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 17, 2022
@bridgetkromhout
Copy link
Contributor

Thanks for the feedback. As the AKS Engine project is deprecated, we're not going to make this change; I suggest looking at Kubernetes Cluster API Provider Azure as a replacement for self-managed clusters (or switching to AKS if that's an option).

@bridgetkromhout bridgetkromhout closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

4 participants