-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set prometheus server's deployment strategy to recreate #735
Conversation
During deployment of 2i2c-org#720, sometimes CI would fail because the prometheus pod would be stuck in 'ContainerCreating', as the old pod was holding on to the persistent disk the new pod needs to start. This was temporarily fixed by deleting the prometheus pod, but this tells kubernetes to delete the old pod properly first before starting the new one. Ref 2i2c-org#720 (comment)
# We have a persistent disk attached, so the default (RollingUpdate) | ||
# can sometimes get 'stuck' and require pods to be manually deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this a bit because I felt "can sometimes" was a bit too vague.
# We have a persistent disk attached, so the default (RollingUpdate) | |
# can sometimes get 'stuck' and require pods to be manually deleted. | |
# As we have a specific persistent disk attached that can only be mounted | |
# by one pod, we must not end up in a situation where a new pod is awaited | |
# to start successfully before an old pod is terminated. By using Recreate | |
# instead of RollingUpdate (default) we avoid such deadlock. |
Note that @damianavila catched something that can be quite easily tested in a CI system - the validity of rendered k8s resources.
One caveat of this is that you can't do this if you first need to install CRDs, then that needs to happen first as the k8s api-server would say that the installation is invalid because it doesn't recognize the custom resources without having loaded the CRDs. I've already opened an issue about this in the past, see #279. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after incorporating @consideRatio's commit suggestion.
@consideRatio, I would love to see the CI system doing what I did 🤖 😜 |
Thanks, @damianavila and @consideRatio! |
During deployment of #720,
sometimes CI would fail because the prometheus pod would be
stuck in 'ContainerCreating', as the old pod was holding on to
the persistent disk the new pod needs to start. This was temporarily
fixed by deleting the prometheus pod, but this tells kubernetes
to delete the old pod properly first before starting the new one.
Ref #720 (comment)