-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(quotas): Add global throughput limit #2928
Conversation
relay-quotas/src/global.rs
Outdated
} | ||
|
||
fn default_request_size_based_on_limit(&self) -> usize { | ||
100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to actually implement this, use a percentage we grab from the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented with a global constant for now.
Might in the future want to use some kind of moving average instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General approach looks good to me. See comment about received
time.
|
||
let key = BudgetKeyRef::new(quota); | ||
let val = { | ||
let mut limits = self.limits.lock().unwrap_or_else(PoisonError::into_inner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually make sense for our business logic to use the value if another thread panicked? From the docs:
However if the Mutex contained, say, a BinaryHeap that does not actually have the heap property, it's unlikely that any code that uses it will do what the author intended. As such, the program should not proceed normally. Still, if you're double-plus-sure that you can do something with the value, the Mutex exposes a method to get the lock anyway. It is safe, after all. Just maybe nonsense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is fine, we never actually panic in the critical section, but even if we do, there is nothing that is left in an unsafe or uncertain state, since all we do is either insert something into the map or clone an Arc.
relay-quotas/src/global.rs
Outdated
} | ||
|
||
struct RedisCounter { | ||
last_seen: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might be just me but I associate last_seen
with a timestamp.
last_seen: u64, | |
latest: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer last_seen
because it makes it clear the value may be out of date and not in sync with Redis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, see questions on the requesting budget and making too many calls to Redis.
relay-quotas/src/redis.rs
Outdated
let org = if self.quota.scope == QuotaScope::Global { | ||
0 | ||
} else { | ||
self.scoping.organization_id | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as self.scoping.scope_id(self.quota.scope)
? I think setting the arbitrary 0
here and scope_id
makes introducing breaking changes too easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is purely a default for the Redis key, the values are unrelated.
/// Returns when the key should expire in Redis. | ||
/// | ||
/// Like [`Self::expiry()`] but adds an additional grace period for the key. | ||
pub fn key_expiry(&self) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn key_expiry(&self) -> u64 { | |
pub fn expiry_with_grace(&self) -> u64 { |
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the key_expiry
, we're only interested to get an expiry for the Redis key which was created by the key
method, this makes it clear they go together. Also that there is a "grace" is not relevant for the caller, the caller is just interested to know when to expire the Redis key.
) -> Result<bool, RedisError> { | ||
let key = KeyRef::new(quota); | ||
let val = { | ||
let mut limits = self.limits.lock().unwrap_or_else(PoisonError::into_inner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth it to add some observability to identify potential bugs when we see plenty of poisoning errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively this lock will never be poisned, this only happens through a panic, which we would already notice. This is unfortunately just an API design error from the std lib we have to deal with here.
return Ok(0); | ||
} | ||
|
||
let budget_to_reserve = min_required_budget.max(self.default_request_size(quantity, quota)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care about the minimum required budget, and not request a default amount directly? Requesting minimum amounts may result in more Redis checks (do we know the impact?) which I believe we're trying to partially avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the minimum is higher than the default, we still need to fetch the minimum to not have a bug, in practice this probably never happens, but there might be an edge case where we have a huge quantity once.
return Ok(false); | ||
} | ||
|
||
let reserved = self.try_reserve(client, quantity, quota)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have mechanisms to avoid requesting from Redis too much? IIUC, we'll make a Redis request every tie there's no budget in the local cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 'solved' by having a high default we reserve from Redis instead of just reserving the quantity. We currently reserve 0.01%
of the limit.
5f677b2
to
9ec555a
Compare
Implements global abuse limit for metric buckets. Relay: getsentry/relay#2928 Epic: getsentry/relay#2716
Similar to #2854 but with locks instead of atomics.
Adds a mechanism to limit the throughput of metric buckets.
It's done globally, meaning across different relays, using redis.
In order to not call redis every time, we use a budget system (the name quota already taken). We "take" a certain budget by incrementing a redis counter for the given global quota in the given slot, put it in a local counter, and count down to zero before asking for more from redis.