Description
Bug description
The workspace can't be started, this started out as an image build failure, but we allowed the user to try restarting the workspace (or restarted the workspace ourselves), resulting in new instances that could not start (because there was no backup).
When an image build fails, we provide the user with an option to start the workspace using the default image. It is possible that happened here. If that is true, it would mean image-builder
is not the culprit for this particular issue, because they actually started the workspace...but then had trouble backups "at some point". Please refer to the logs and instance IDs to paint a complete picture.
Steps to reproduce
Try using a custom image with just this:
FROM hashicorp/terraform:1.2.9
It should fail, but, let you try restarting the workspace, resulting in you seeing a message indicating that the workspace backup doesn't exist.
Workspace affected
Refer to logs
Expected behavior
- If an image build fails, show the reason for the image build failure, rather than a generic headless image build failed. It is presently difficult to know why image builds fail, so it is important to share the actual error with users.
- If an image build fails, and the user does not run with our default image instead, the ending status for the workspace instance should not allow for restarts. Otherwise the danger is the user may not be able to start the workspace (due to an image build failure), and never see why on the subsequent restarts.
Example repository
No response
Anything else?
Logs for the workspace in question:
Internal slack conversation: https://gitpod.slack.com/archives/C02SF4A050W/p1663161853069319
workspace logs: https://cloudlogging.app.goo.gl/A2UvJMTtQd2eaFk38
webapp logs: https://cloudlogging.app.goo.gl/YpNtszBdZRGx2VXi8
Here is what we saw as a failure for the most recent instance failure that the customer was getting
cannot initialize workspace:
github.com/gitpod-io/gitpod/content-service/pkg/initializer.InitializeWorkspace
github.com/gitpod-io/gitpod/content-service@v0.0.0-00010101000000-000000000000/pkg/initializer/initializer.go:439
- no backup found:
github.com/gitpod-io/gitpod/content-service/pkg/initializer.(*fromBackupInitializer).Run
github.com/gitpod-io/gitpod/content-service@v0.0.0-00010101000000-000000000000/pkg/initializer/initializer.go:199
We're calling Run
on the initializer here, hasBackup is false, but, the stack trace suggests the initializer we're working with is a fromBackupInitializer given this error.
In looking at the entire history for this workspace...this particular workspace has five instances in the logs!
The initial workspace had an error (root cause) of:
[adduser -h /home/gitpod -s /bin/sh -D -G 33333 -u 33333 gitpod]: exit status 1: adduser: unknown group 33333:
github.com/gitpod-io/gitpod/supervisor/pkg/supervisor.addUser
github.com/gitpod-io/gitpod/supervisor/pkg/supervisor/user.go:172
- exit status 1
The related message is cannot ensure Gitpod user exists .
The related instance IDs are:
8e70e9a5-b464-4dab-8826-0ed331b88ca9
0aa0fb2b-0cd9-4d12-9330-552797c6a2a0
349a594f-f55a-47ff-ac64-ccca827c2ffa
e61ebcc4-c300-47f8-a729-2b4d45c19365
56adcae2-66ba-463f-9555-7dace043ff90
Lastly, this is related: #8908
Definition of done
When an image build fails, like due to the Gitpod user not existing, the end user sees that corresponding problem, instead of headless task failed.
Also, update the docs on the Gitpod website, so users know how to add the Gitpod user to a custom image not using a Gitpod base image.