Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-manager] fix workspace status flipping pending to deleted #9438

Merged
merged 1 commit into from
Apr 21, 2022

Conversation

sagor999
Copy link
Contributor

@sagor999 sagor999 commented Apr 20, 2022

Description

Fixes an issue where workspace status can get flipped from PENDING to TERMINATED back and forth if workspace pod is stuck in pending state (can happen if we are scaling up). When this happens, ws-manager will delete the pod after 5s and try again. Deletion of the pod would trigger workspace status update to get flipped into TERMINATED state causing webapp to delete workspace token from db.
This PR fixes this issue.

@geropl there is a small UX issue when pod is stuck in pending: it doesn't seem like there is any way to signal to stop the workspace. It will be stuck in pending with no way to stop it unless ws-manager will be able to schedule it.

Related Issue(s)

Fixes #8703

How to test

Spin up workspace preview env
Start workspace and observe it starts normally
Stop workspace
Cordon the node
Start workspace again
Observe that ws-manager keeps re-creating the pod every 5s as it is stuck in pending state but dashboard will remain now on Preparing workspace phase. Also observe that ws-manager-bridge is now only getting status updates that indicate phase is pending.

Release Notes

[ws-manager] fix a bug when opening workspace you would be signed out from git and not able to do git commands

Documentation

@sagor999 sagor999 changed the title [ws-manager] while creating workspace pod make sure workspace status … [ws-manager] fix workspace status flipping pending to deleted Apr 20, 2022
@sagor999
Copy link
Contributor Author

sagor999 commented Apr 20, 2022

/werft run with-clean-slate-deployment

👍 started the job as gitpod-build-pavel-8703.2

@sagor999 sagor999 marked this pull request as ready for review April 20, 2022 21:59
@sagor999 sagor999 requested a review from a team April 20, 2022 21:59
@github-actions github-actions bot added the team: workspace Issue belongs to the Workspace team label Apr 20, 2022
@@ -200,6 +200,9 @@ func (m *Manager) StartWorkspace(ctx context.Context, req *api.StartWorkspaceReq
}
span.LogKV("event", "pod description created")

// add an annotation to the pod to signal that ws-manager will now attempt to create this pod
pod.Annotations[attemptingToCreatePodAnnotation] = "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be part of the createDefiniteWorkspacePod and corresponding tests (cd pkg/manager && go test -update -force .)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. 😅
Adding it into createDefiniteWorkspacePod will lose some context as to why it is added, and also this whole retry logic is a temporary workaround for OutOfMemory error. Once we have 1.23.6 k8s and confirm it is working, we might consider removing retry logic all together.
Another reason not to add this into createDefiniteWorkspacePod is that it is not obvious that someone has to remove that annotation when pod is created. So if that function will be used later somewhere else, it will have unexpected side effects for someone who is using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent to keep the code together - it is however a break in current style.

Today, all pod creation is covered by the very same createDefiniteWorkspacePod (CDWP) function and tests. I.e. no modifications happen outside of the initial pod struct production prior to creating it in Kubernetes. Adding this annotation outside of the regular pod creation path means we have no coverage that it's added, and we can no longer be sure that what our fixtures say will be used is what's actually used.

createDefiniteWorkspacePod only makes sense in the path of creating a pod, and is very unlikely to be used elsewhere.

Also, I don't think this workaround will go away any time soon. When 1.23.6 comes around, we might be able to upgrade in SaaS. But self-hosted users might face the very same problem still. IMHO this code is here to stay for a good while longer.

I have no hard feelings on this (would be ok either way), but a clear preference towards making this part of CDWP for the reasons outlined above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csweichel I see your point. Fair enough, code and tests updated.

components/ws-manager/pkg/manager/manager.go Outdated Show resolved Hide resolved
@sagor999
Copy link
Contributor Author

@csweichel PTAL 🙏

@sagor999 sagor999 requested a review from csweichel April 21, 2022 00:18
@geropl
Copy link
Member

geropl commented Apr 21, 2022

@geropl there is a small UX issue when pod is stuck in pending: it doesn't seem like there is any way to signal to stop the workspace. It will be stuck in pending with no way to stop it unless ws-manager will be able to schedule it.

@sagor999 Agreed, thx for hinting: Will add a note to #8274

@roboquat roboquat merged commit 0c66eb2 into main Apr 21, 2022
@roboquat roboquat deleted the pavel/8703 branch April 21, 2022 15:14
@geropl
Copy link
Member

geropl commented Apr 22, 2022

🎉 🚀 🙏

@geropl
Copy link
Member

geropl commented Apr 22, 2022

Is there an ETA for this? I guess sometime next week?

@roboquat roboquat added the deployed: workspace Workspace team change is running in production label Apr 23, 2022
@roboquat roboquat added the deployed Change is completely running in production label Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note size/M team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing server API at gitpod.io from within workspaces might fail sometimes
4 participants