Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[content-init] workspace start is slow, waits between awaiting seccomp fd and supervisor: workspace content available #13256

Closed
kylos101 opened this issue Sep 23, 2022 · 7 comments
Labels
component: ws-daemon type: bug Something isn't working

Comments

@kylos101
Copy link
Contributor

kylos101 commented Sep 23, 2022

Bug description

👋 sometimes workspace starts can take many minutes, and the delay (for some cases) is inbetween here and here.

Here is a link for the entire set of logs. You'll see that delay the customer experienced (confirmed in webapp logs where we see browser connection here, which was approximately ~40 minutes).

Definition of done

  1. Add logging and tracing that sits in between these two events (i.e. workspacekit, ws-daemon, maybe runInitializer), so, we can understand why we might be stalling for so long. The logging should likely use exponential backoff, so that if its running repeatedly for 40 minutes, it isn't too verbose.
  2. Create a follow-up issue whenever we have enough data from the logs.

Steps to reproduce

Unclear

Workspace affected

instance id 5740c71b-c6ea-4735-ae15-93daacc7982e

Expected behavior

The delay should be way less than ~40 minutes.

Example repository

n/a

Anything else?

Run this query to view all instances for this workspace. Some have a null minutes to start, others have a really long time, and the bulk and balance are short (under 2 minutes).

Node performance at the time of the event seemed okay:

image

image

image

image

Front logo Front conversations

@kylos101 kylos101 added type: bug Something isn't working component: ws-daemon labels Sep 23, 2022
@kylos101 kylos101 moved this to Breakdown in 🌌 Workspace Team Sep 23, 2022
@atduarte atduarte moved this from Breakdown to Scheduled in 🌌 Workspace Team Sep 27, 2022
@sagor999
Copy link
Contributor

I think this is the same issue as #12345
It is waiting on prebuild\content initializer to do its thing.

@utam0k
Copy link
Contributor

utam0k commented Oct 5, 2022

Apparently, not only is it slow, but it can lapse for up to an hour and fail to boot. Perhaps a different cause, but the symptoms are similar.
https://cloudlogging.app.goo.gl/QX4BZ6HK4wBueEpv6

We can see

log.Infof("%s does not exist, going to wait for %s", fn, fnReady)
on this log,
but before reaching
log.WithField("source", m.Source).Info("supervisor: workspace content available")
, supervisor got SIGTERM
log.Info("received SIGTERM (or shutdown) - tearing down")

I'm not sure who sent out SIGTERM to supervisor 🤔

However, the fact that there is a log of the supervisor means that ring2 must have succeeded in starting the supervisor itself.

@kylos101
Copy link
Contributor Author

I am going to remove this from groundwork for now.

We experienced a similar issue on Friday, and @iQQBot added a change in #13828

It should ship in gen72. We can consider adding to groundwork in the future if the issue continues.

@kylos101 kylos101 removed the status in 🌌 Workspace Team Oct 17, 2022
@utam0k
Copy link
Contributor

utam0k commented Oct 18, 2022

@kylos101 Should we mark the issue blocked?

@kylos101
Copy link
Contributor Author

👋 @utam0k generally blocked is used because it's actively being worked (in-progress), but we cannot proceed. In this case, I think leaving in inbox is okay (because we'll see recent comments to get context).

@utam0k
Copy link
Contributor

utam0k commented Jan 13, 2023

@kylos101 Can we close this issue? I think this issue was resolved with #14821

@kylos101
Copy link
Contributor Author

@utam0k I defer to you, you are the expert. 😄 Worst case, we can always reopen. 😸

@utam0k utam0k closed this as completed Jan 15, 2023
@github-project-automation github-project-automation bot moved this to Awaiting Deployment in 🌌 Workspace Team Jan 15, 2023
@utam0k utam0k moved this from Awaiting Deployment to Done in 🌌 Workspace Team Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: ws-daemon type: bug Something isn't working
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants