Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KOTS installer 5 minute timeout is too low #14444

Closed
adrienthebo opened this issue Nov 4, 2022 · 3 comments
Closed

KOTS installer 5 minute timeout is too low #14444

adrienthebo opened this issue Nov 4, 2022 · 3 comments
Labels
component: install Terraform installation scripts, helm charts, installer images team: delivery Issue belongs to the self-hosted team type: bug Something isn't working

Comments

@adrienthebo
Copy link
Contributor

adrienthebo commented Nov 4, 2022

Bug description

While observing a customer call we saw that an upgrade of a real-world environment was taking over 4 minutes; this is precariously close to the 5 minute helm upgrade timeout.

Additional debugging pointed the finger at shiftfs-module-loader. This init container compiles the shiftfs kernel module (which is slow) and the version of Kubernetes on some distributions updates daemonsets one node at a time. Combining the slow runtime of shiftfs-module-loader with the linear scaling properties of daemonsets, it's possible for the installation process to be very lengthy on larger clusters.

Steps to reproduce

n/a

Workspace affected

No response

Expected behavior

The helm upgrade process should have a timeout period long enough that it can't accidentally be reached. A value of 10 - 15 minutes is less risky.

Example repository

No response

Anything else?

No response

@adrienthebo adrienthebo added type: bug Something isn't working component: install Terraform installation scripts, helm charts, installer images team: delivery Issue belongs to the self-hosted team labels Nov 4, 2022
@mrsimonemms
Copy link
Contributor

Cornelius and I settled on a 5m timeout for the reason that there is no way to stop a deployment from within the KOTS dashboard. If we set it to much longer than that then users will have to manually stop the job if they've started a deployment from the dashboard which is both undocumented and a pretty-awful experience.

What might be a better, more rounded experience would be to add a config option in the KOTS config (suggest the "advanced" section). By default, it would be 5m, otherwise it would be whatever the config value was (we probably want to keep the 1h setting in the case where there is no https-certificates secret as DNS promulgation and LetsEncrypt can take their sweet time).

That way, we can be quite granular for our customers rather than using a sledgehammer to crack a nut.

@corneliusludmann
Copy link
Contributor

@adrienthebo and @mrsimonemms: What's the state of this issue? Is there still work needed or can this be closed?

@mrsimonemms
Copy link
Contributor

Fixed by #14500

Repository owner moved this from ⚒In Progress to ✨Done in 🚚 Security, Infrastructure, and Delivery Team (SID) Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: install Terraform installation scripts, helm charts, installer images team: delivery Issue belongs to the self-hosted team type: bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants