Skip to content

[dashboard, server] Prevent redirect loops that trigger "startWorkspace" in a loop #8043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geropl opened this issue Feb 4, 2022 · 7 comments · Fixed by #8125
Closed

[dashboard, server] Prevent redirect loops that trigger "startWorkspace" in a loop #8043

geropl opened this issue Feb 4, 2022 · 7 comments · Fixed by #8125
Assignees
Labels

Comments

@geropl
Copy link
Member

geropl commented Feb 4, 2022

Ways to solve this:

    1. level:
    • fix the broken assumption in the dashboard (error handling)
    1. level:
    • have preventive measures in place in dashboard (e.g., exp. backoff for startWorkspace)
    • have preventive measures in place in server: rate-limit startWorkspace more strictly

Proof that this issue enforced this incident: https://ui.honeycomb.io/gitpod/datasets/gitpod-production/result/xnbGHzzXiP7?tab=traces

@sagor999
Copy link
Contributor

sagor999 commented Feb 4, 2022

Just want to add that we should try to fix it ASAP as it makes us ddos ourselves.

@csweichel
Copy link
Contributor

Just want to add that we should try to fix it ASAP as it makes us ddos ourselves.

💯 agreed.

@jldec can you make sure this gets high up on the priority list?

@jldec jldec moved this to Scheduled in 🍎 WebApp Team Feb 4, 2022
@geropl geropl self-assigned this Feb 7, 2022
@jankeromnes
Copy link
Contributor

FYI, a few links to more context about this problem, in case anyone finds this useful:

@geropl
Copy link
Member Author

geropl commented Feb 7, 2022

Here's the startWorkspace traces from Friday: https://ui.honeycomb.io/gitpod/datasets/gitpod-production/result/8xyAV6hi4Jj
It shows that for a single user the same workspace is always re-starting the same workspace.

@geropl
Copy link
Member Author

geropl commented Feb 7, 2022

Reproducable in devstaging by:

  • copying the relevant ws-daemon label from the node, e.g. gitpod.io/ws-daemon_ready_ns_staging-gpl-8043-redirect-loop: "true"
  • scaling down ws-daemon (e.g. by requiring a non-existing node label)
  • manually re-adding said ws-daemon_ready... node label 👍

@geropl
Copy link
Member Author

geropl commented Feb 7, 2022

The plan is to:

  • [server, dashboard] Do basic rate limiting on startWorkspace #8073: enable basic rate-limiting (per server instance) and handle re-tries in dashboard
  • fix the underlying communication issue between the page served by ws-proxy / workspace cluster and the dashboard / app cluster by adding an parameter (not found) that helps the dashboard disting
  • (optional) find persistent means to enforece rate-limiting (looking at recent instance starts in the DB OR connect rate-limiter to the DB)

@JanKoehnlein JanKoehnlein moved this from Scheduled to In Progress in 🍎 WebApp Team Feb 8, 2022
@jldec jldec removed the priority: highest (user impact) Directly user impacting label Feb 8, 2022
@jldec
Copy link
Contributor

jldec commented Feb 8, 2022

Removed priority: highest (user impact) label given that basic rate-limiter should limit impact to single-user now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants