-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server, dashboard] Do basic rate limiting on startWorkspace #8073
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8073 +/- ##
==========================================
- Coverage 12.01% 10.86% -1.16%
==========================================
Files 20 18 -2
Lines 1190 1022 -168
==========================================
- Hits 143 111 -32
+ Misses 1043 909 -134
+ Partials 4 2 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, so cool to see the rate limiter code actually working! 👀 (Last time we tried it with Team Plan methods, I think it didn't behave as expected at all and we had to revert it again.)
I guess this makes our "infinite-redirect on workspace fast crash" failure mode a little better (i.e. we make the self-DDOS 10x less severe), and the code looks good to me. 👍
I'm just curious if this can have any impact in normal situations.
- For one, I guess it will impact the use case "have 5 stopped workspaces and restart them all by clicking all the tabs" (I guess it will just make their restart
[0-4] * 10s
slower, but maybe that's okay) - Other impact can probably be tested on staging once merged. I expect no blockers, but if starting workspaces on staging becomes unreliable, we might have to revert again before deploying (that's what happened with our Team Plan calls rate-limiting attempt last time)
In any case, this looks good for merging & further testing on staging! 🚀
startWorkspace: { | ||
points: 1, // 1 workspace start per user per 10s | ||
durationsSec: 10 | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -356,7 +356,7 @@ class GitpodJsonRpcProxyFactory<T extends object> extends JsonRpcProxyFactory<T> | |||
throw rlRejected; | |||
} | |||
log.warn({ userId }, "Rate limiter prevents accessing method due to too many requests.", rlRejected, { method }); | |||
throw new ResponseError(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { "Retry-After": String(Math.round(rlRejected.msBeforeNext / 1000)) || 1 }); | |||
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably safer with a Math.ceil
, but not a major issue (since the front-end retry code works well):
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.round(rlRejected.msBeforeNext / 1000) || 1 }); | |
throw new ResponseError<RateLimiterError>(ErrorCodes.TOO_MANY_REQUESTS, "too many requests", { method, retryAfter: Math.ceil(rlRejected.msBeforeNext / 1000) || 1 }); |
@jankeromnes Thx for the thorough review! 🙏
Indeed. This is first and foremost a safety measure to help with the next days. Next on the list:
|
Description
This configures a rate-limit for API calls to
startWorkspace
of1
/10s
. The dashboard handles the error and waits untilretryAfter
to re-try the call (currently indefinitely).Note that this is enforced per
server
instance at the moment, as the rate-limiter state is not shared between server instances. But it's a first step that already reduces the impact of future incidents.Related Issue(s)
Context: #8043
How to test
Release Notes
Documentation