Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quotas): Add global throughput limit #2928

Merged
merged 81 commits into from
Jan 15, 2024
Merged

Conversation

Dav1dde
Copy link
Member

@Dav1dde Dav1dde commented Jan 10, 2024

Similar to #2854 but with locks instead of atomics.

Adds a mechanism to limit the throughput of metric buckets.

It's done globally, meaning across different relays, using redis.

In order to not call redis every time, we use a budget system (the name quota already taken). We "take" a certain budget by incrementing a redis counter for the given global quota in the given slot, put it in a local counter, and count down to zero before asking for more from redis.

@TBS1996 TBS1996 marked this pull request as ready for review January 12, 2024 07:54
@TBS1996 TBS1996 requested a review from a team as a code owner January 12, 2024 07:54
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
}

fn default_request_size_based_on_limit(&self) -> usize {
100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to actually implement this, use a percentage we grab from the config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented with a global constant for now.

Might in the future want to use some kind of moving average instead.

relay-quotas/src/global.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
relay-quotas/src/lib.rs Outdated Show resolved Hide resolved
relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach looks good to me. See comment about received time.

relay-quotas/src/global.rs Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved

let key = BudgetKeyRef::new(quota);
let val = {
let mut limits = self.limits.lock().unwrap_or_else(PoisonError::into_inner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually make sense for our business logic to use the value if another thread panicked? From the docs:

However if the Mutex contained, say, a BinaryHeap that does not actually have the heap property, it's unlikely that any code that uses it will do what the author intended. As such, the program should not proceed normally. Still, if you're double-plus-sure that you can do something with the value, the Mutex exposes a method to get the lock anyway. It is safe, after all. Just maybe nonsense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is fine, we never actually panic in the critical section, but even if we do, there is nothing that is left in an unsafe or uncertain state, since all we do is either insert something into the map or clone an Arc.

relay-quotas/src/global.rs Outdated Show resolved Hide resolved
}

struct RedisCounter {
last_seen: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be just me but I associate last_seen with a timestamp.

Suggested change
last_seen: u64,
latest: u64,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer last_seen because it makes it clear the value may be out of date and not in sync with Redis.

relay-quotas/src/global_quota.lua Outdated Show resolved Hide resolved
relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
relay-quotas/src/quota.rs Outdated Show resolved Hide resolved
relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, see questions on the requesting budget and making too many calls to Redis.

Comment on lines 123 to 127
let org = if self.quota.scope == QuotaScope::Global {
0
} else {
self.scoping.organization_id
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as self.scoping.scope_id(self.quota.scope)? I think setting the arbitrary 0 here and scope_id makes introducing breaking changes too easy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is purely a default for the Redis key, the values are unrelated.

relay-quotas/src/redis.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
relay-quotas/src/global.rs Outdated Show resolved Hide resolved
/// Returns when the key should expire in Redis.
///
/// Like [`Self::expiry()`] but adds an additional grace period for the key.
pub fn key_expiry(&self) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn key_expiry(&self) -> u64 {
pub fn expiry_with_grace(&self) -> u64 {

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the key_expiry, we're only interested to get an expiry for the Redis key which was created by the key method, this makes it clear they go together. Also that there is a "grace" is not relevant for the caller, the caller is just interested to know when to expire the Redis key.

) -> Result<bool, RedisError> {
let key = KeyRef::new(quota);
let val = {
let mut limits = self.limits.lock().unwrap_or_else(PoisonError::into_inner);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it to add some observability to identify potential bugs when we see plenty of poisoning errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively this lock will never be poisned, this only happens through a panic, which we would already notice. This is unfortunately just an API design error from the std lib we have to deal with here.

return Ok(0);
}

let budget_to_reserve = min_required_budget.max(self.default_request_size(quantity, quota));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the minimum required budget, and not request a default amount directly? Requesting minimum amounts may result in more Redis checks (do we know the impact?) which I believe we're trying to partially avoid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the minimum is higher than the default, we still need to fetch the minimum to not have a bug, in practice this probably never happens, but there might be an edge case where we have a huge quantity once.

return Ok(false);
}

let reserved = self.try_reserve(client, quantity, quota)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have mechanisms to avoid requesting from Redis too much? IIUC, we'll make a Redis request every tie there's no budget in the local cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 'solved' by having a high default we reserve from Redis instead of just reserving the quantity. We currently reserve 0.01% of the limit.

relay-quotas/src/global.rs Outdated Show resolved Hide resolved
@Dav1dde Dav1dde force-pushed the tor-dav1d/global-limit branch from 5f677b2 to 9ec555a Compare January 12, 2024 14:52
@Dav1dde Dav1dde merged commit d36b1fa into master Jan 15, 2024
20 checks passed
@Dav1dde Dav1dde deleted the tor-dav1d/global-limit branch January 15, 2024 14:33
TBS1996 added a commit to getsentry/sentry that referenced this pull request Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants