-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[dashboard, supervisor, ws-proxy] Break redirect loop on failing workspaces starts #8125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8125 +/- ##
==========================================
- Coverage 11.98% 10.82% -1.16%
==========================================
Files 20 18 -2
Lines 1193 1025 -168
==========================================
- Hits 143 111 -32
+ Misses 1046 912 -134
+ Partials 4 2 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
/hold
|
If we no longer wish to make the (inframe, IDE url) auto-start an instance, it seems like in this step we should call something like However, I guess in both cases, we break the behavior of "refresh stopped workspace tab to restart it" (which is probably okay though -- this eager restarting is what caused the problem, and we could instead show the "stopped" screen which has a restart button if you do want to do that). |
👍 That's what we do with
No, because we need a "safe place" for |
Ah, I don't understand why a redirect is needed (instead of just staying on the ideURL, which shows the "stopped" page with buttons). I'll read the code to understand better. |
What helped me tremendously was to a) have the workspaces auto-fail and b) |
Thanks! 👍 My current suspicion is that we could simply delete the redirection from supervisor (because it already has an iframe that knows how to show a "Stopped" or error page, right?), however for ws-proxy I don't currently see a good alternative. I agree that allowing to disable the "start" behavior of |
How can I test other flows besides with failed workspace? |
IDE is an owner of URL, I don't think we should mess with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for diving deeper into this problem. 🙏
I wonder if we could make this fix a bit simpler. Currently, there is:
- trigger (with a complicated type)
- dontAutostart
- dontRedirect
- showRestartModal
But:
- Do we really need to know where the redirect came from? I think it could be simpler to have just
?autostart=false
as the only (optional) query parameter, regardless of where we came from (that's not really important and introduces too much complexity) dontAutostart
could be replaced by just theautostart
prop- is
dontRedirect
really needed? (if we never callserver.startWorkspace()
, do we ever get aworkspaceInstance.ideUrl
?) - we don't need an extra
showRestartModal
, because the existing UI already knows what to show when the instance is stopped / doesn't exist (i.e. it has a convenientRun Workspace
button)
Wouldn't it be enough to just have a /start/?autostart=false#wsID
query parameter, which "simply" prevents the call to this.startWorkspace()
? 💭 (Maybe I'm missing something)
@akosyakov You could create a branch from this one, workspaces start by default.
Agreed, so I'll stay with this solution. 👍 |
@jankeromnes Sorry if I was unclear, IMO we're mixing a few things:
No, but IMO
Yes it is: We get instance updates (incl. some with |
1b0bd82
to
147b0fc
Compare
Hm. We needed that indirection because we notice a re-starting workspace, and at the moment dashboard is the only place which is allows to refresh authentication credentials. (In theory we could extend that to IDE-URLs, but IMO we have refused to do that because we wanted avoid the need to properly parse IDEUrls - excluding every path/subdomain that's potentially under user control 🙃 ) With the new behavior it could be that it's superfluous... but I'm not sure, and like to avoid including that optimization here, as it would mean a lot of manual test effort, again. 🙂 |
147b0fc
to
85b9c83
Compare
@akosyakov @jankeromnes The PR is rebased, and cleaned up. Now ws-proxy/supervisor redirect to Test (always fails): https://gpl-8043-break-redirect-loop.staging.gitpod-dev.com/workspaces |
57006dc
to
9520984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going through the changes from yesterday, seems that going with the non-modal approach sounds better. 🥊
Here're some points to consider improving although these could be out of the scope of this PR.
1️⃣ Remove the last part of the error message as it does not contain any clear action forward.
2️⃣ Remove the No Changes element as it's probably not relevant or useful here.
3️⃣ Use the red dot indicator as the workspace failed to open.
4️⃣ Keep the same Open action verb for the button label.
BEFORE | AFTER |
---|---|
![]() |
![]() |
Thanks for the ping, @geropl! 🏓
@gtsiolis I agree with most (all?) of your suggestions, thx! 🙏 |
@geropl 🙏 |
4c96ef6
to
3228e92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the approach - nice, clean, simple change.
Once the comment is mitigated I'll happily approve :)
…ling autostart When triggered: a) inFrame or b) when redirect from IDE url (by ws-proxy)
3228e92
to
5d02f7a
Compare
instance.status.phase === 'pending' || | ||
instance.status.phase === 'creating' || | ||
instance.status.phase === 'initializing')) { | ||
// reload the page if the workspace was restarted to ensure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geropl, how relevant is this removal?
owner token etc. will be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexTugarev see discussions here: #8125 (comment)
I think it is the proper way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
owner token etc. will be fine?
Yes. as Anton pointed out:
- the mechanism got pushed into
StartWorkspace
(where it belonged in the first place) fetchWorkspaceInfo
handles owner token renewal (ensureWorkspaceAuth
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do it on purpose to remove dependencies from supervisor and loading screen. Now a user will need to reload explicitly. We add a new button for that on the loading screen. I think it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed that, as it was resolved.
Nice and clean!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please address in the follow-up PR: #8125 (comment)
@akosyakov this is relevant, and should also be easy to fix (we need to make sure IDEOptions are fetched on the |
const startedInstanceId = this.state?.startedInstanceId; | ||
if (startedInstanceId !== workspaceInstance.id) { | ||
// do we want to switch to "new" instance we just received an update for? | ||
const switchToNewInstance = this.state.workspaceInstance?.status.phase === "stopped" && workspaceInstance.status.phase !== "stopped"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if previous instances is in stopping
or never reaches stopped
, this reads like an dead end.
assuming that we always have a single instance running per workspace, let's just focus on the status/phase of the new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in general, but think it's not super important. And has been like this for years.
But in light of this change we should watch more sure! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
/unhold |
Description
This PR breaks the restart cycles we recently oberserve for workspaces that fail quickly enough.
The basic cycle was:
StartWorkspace
(dashboard, /start) -> onload:server.startWorkspace
server.startWorkspace
returns 👍 inkl.ideUrl
(1. instance)StartWorkspace
redirects toideUrl
StartWorkspace
(inframe, IDE url) -> onload:server.startWorkspace
:a. expecting
server
to return the already running instance from step 1)b. but that was already stopped with an error, so started a fresh one (2. instance)
c.
supervisor
frontend suspected a re-start (because it seesstopped
updates from 1. instance, followed bypending
/creating
from 2.), and to properly re-connect the frontend, re-directs toStartWorkspace
🔁This PR changes three things:
ws-proxy
now re-direct to/start/?not_found=true
.StartWorkspace
reads that parameter and setsdontAutostart
, which in turn ensuresstartWorkspace(id)
is not called on loadStartWorkspace
is loaded inside an iframe (when already on the ideUrl)dontAutostart: true
is set as well.supervisor
frontend got removed, and replaced with a similar mechanism inStartWorkspace
.Effects on the "start workspace" flow
startWorkspace
that often, which should help with not confusing the rate-limiter (and potentially fasten the startup process 🤞 )UI
Currently the cycle breaks with this screen:

Alternatively we could use a modal but I found it to be bad because:
Update: removed that code
Alternatives
Alternatively, and to further strengthen the process, we could try to find a way to pass theinstanceId
fromdashboard
tosupervisor
frontend. This would avoid the whole issue altogether (and make code in supervisor frontend easier), because we know upfront which instance we expect to see here.The only way I can think of is to make it part of the url;
<workspaceId>.ws-<cluster>.gitpod.io/#<instanceId>
, for example. But I'm unsure if and how this affects IDE, and if yes, how to avoid that (/cc @akosyakov )Dismissed here.
Note: There sometimes seems to be CORS issues with redirects from
ws-proxy
, but sometimes they work. I assumed them to be out-of-scope here, because this PR does not touch CORS or any domains used.Related Issue(s)
Fixes #8043
How to test
Negative
Positive
Release Notes
Documentation