-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-manager] fix workspace status flipping pending to deleted #9438
Conversation
/werft run with-clean-slate-deployment 👍 started the job as gitpod-build-pavel-8703.2 |
@@ -200,6 +200,9 @@ func (m *Manager) StartWorkspace(ctx context.Context, req *api.StartWorkspaceReq | |||
} | |||
span.LogKV("event", "pod description created") | |||
|
|||
// add an annotation to the pod to signal that ws-manager will now attempt to create this pod | |||
pod.Annotations[attemptingToCreatePodAnnotation] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be part of the createDefiniteWorkspacePod
and corresponding tests (cd pkg/manager && go test -update -force .
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. 😅
Adding it into createDefiniteWorkspacePod
will lose some context as to why it is added, and also this whole retry logic is a temporary workaround for OutOfMemory error. Once we have 1.23.6 k8s and confirm it is working, we might consider removing retry logic all together.
Another reason not to add this into createDefiniteWorkspacePod
is that it is not obvious that someone has to remove that annotation when pod is created. So if that function will be used later somewhere else, it will have unexpected side effects for someone who is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intent to keep the code together - it is however a break in current style.
Today, all pod creation is covered by the very same createDefiniteWorkspacePod (CDWP) function and tests. I.e. no modifications happen outside of the initial pod struct production prior to creating it in Kubernetes. Adding this annotation outside of the regular pod creation path means we have no coverage that it's added, and we can no longer be sure that what our fixtures say will be used is what's actually used.
createDefiniteWorkspacePod
only makes sense in the path of creating a pod, and is very unlikely to be used elsewhere.
Also, I don't think this workaround will go away any time soon. When 1.23.6
comes around, we might be able to upgrade in SaaS. But self-hosted users might face the very same problem still. IMHO this code is here to stay for a good while longer.
I have no hard feelings on this (would be ok either way), but a clear preference towards making this part of CDWP for the reasons outlined above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csweichel I see your point. Fair enough, code and tests updated.
@csweichel PTAL 🙏 |
🎉 🚀 🙏 |
Is there an ETA for this? I guess sometime next week? |
Description
Fixes an issue where workspace status can get flipped from PENDING to TERMINATED back and forth if workspace pod is stuck in pending state (can happen if we are scaling up). When this happens, ws-manager will delete the pod after 5s and try again. Deletion of the pod would trigger workspace status update to get flipped into TERMINATED state causing webapp to delete workspace token from db.
This PR fixes this issue.
@geropl there is a small UX issue when pod is stuck in pending: it doesn't seem like there is any way to signal to stop the workspace. It will be stuck in pending with no way to stop it unless ws-manager will be able to schedule it.
Related Issue(s)
Fixes #8703
How to test
Spin up workspace preview env
Start workspace and observe it starts normally
Stop workspace
Cordon the node
Start workspace again
Observe that ws-manager keeps re-creating the pod every 5s as it is stuck in pending state but dashboard will remain now on Preparing workspace phase. Also observe that ws-manager-bridge is now only getting status updates that indicate phase is pending.
Release Notes
Documentation