Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quotas): Add global throughput limit #2928

Merged
merged 81 commits into from
Jan 15, 2024
Merged
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
26f1235
wip
TBS1996 Dec 15, 2023
9a36e1a
wip
TBS1996 Dec 20, 2023
4021f9b
wip
TBS1996 Dec 20, 2023
cc945ea
wip
TBS1996 Dec 20, 2023
cc8c4df
fix expiry
TBS1996 Dec 20, 2023
c9fe40a
wip
TBS1996 Dec 21, 2023
5ae80da
wip
TBS1996 Dec 21, 2023
dd216c5
enable tests
TBS1996 Dec 21, 2023
2737e48
wip
TBS1996 Jan 3, 2024
1cf531a
mrg
TBS1996 Jan 3, 2024
8c081db
wip
TBS1996 Jan 3, 2024
b9741b0
finished test
TBS1996 Jan 4, 2024
1358339
finished test
TBS1996 Jan 4, 2024
8b6a33a
wip
TBS1996 Jan 5, 2024
25f5dd0
wip
TBS1996 Jan 5, 2024
54c8e8e
wip
TBS1996 Jan 5, 2024
3a8aebf
wip
TBS1996 Jan 5, 2024
5e918d6
wip
TBS1996 Jan 5, 2024
97951c9
wip
TBS1996 Jan 5, 2024
baf3434
wip
TBS1996 Jan 5, 2024
81e6a0d
cl
TBS1996 Jan 5, 2024
e9b1533
Merge branch 'master' into tor/nslimit
TBS1996 Jan 5, 2024
bf07bda
fix lint
TBS1996 Jan 5, 2024
70dca21
Merge branch 'tor/nslimit' of https://github.com/getsentry/relay into…
TBS1996 Jan 5, 2024
a440778
fix lint
TBS1996 Jan 5, 2024
18f9a3a
fix lint
TBS1996 Jan 5, 2024
7de09e6
debug in ci
TBS1996 Jan 5, 2024
138360b
debug in ci
TBS1996 Jan 5, 2024
ba47eac
debug in ci
TBS1996 Jan 5, 2024
f9ab69b
debug in ci
TBS1996 Jan 5, 2024
36ab40d
debug in ci
TBS1996 Jan 5, 2024
77e3874
debug in ci
TBS1996 Jan 5, 2024
d4af90c
more effective redis sync
TBS1996 Jan 5, 2024
61e1ee4
rm decoutcome and refs
TBS1996 Jan 5, 2024
2ebbeab
delay bwetween sending buckets
TBS1996 Jan 6, 2024
1f753d5
refactor
TBS1996 Jan 6, 2024
afe425a
fix deadlock
TBS1996 Jan 6, 2024
c79d2bf
debug
TBS1996 Jan 6, 2024
511bf52
add unit test
TBS1996 Jan 7, 2024
bd60cbf
debug
TBS1996 Jan 7, 2024
412e0d4
nit
TBS1996 Jan 7, 2024
d7fbc6d
fix
TBS1996 Jan 7, 2024
b02f980
add loop safety
TBS1996 Jan 7, 2024
e5ec453
nit
TBS1996 Jan 7, 2024
f0e95f9
nit
TBS1996 Jan 8, 2024
00b57c6
nit
TBS1996 Jan 8, 2024
4b53b60
ref
TBS1996 Jan 8, 2024
ad01e38
debounce redis calls
TBS1996 Jan 8, 2024
030466b
atomic block for redis calls
TBS1996 Jan 9, 2024
626b47a
nits
TBS1996 Jan 9, 2024
b79ad58
ref
TBS1996 Jan 9, 2024
530bf09
block val on redis call
TBS1996 Jan 10, 2024
25b4275
wip
TBS1996 Jan 10, 2024
47f207c
wip
Dav1dde Jan 10, 2024
4629dd2
wip
Dav1dde Jan 10, 2024
0ccf7ac
wip
TBS1996 Jan 10, 2024
862b306
arced mutex
TBS1996 Jan 10, 2024
16d7b0d
move limit
TBS1996 Jan 10, 2024
6056d49
docs
TBS1996 Jan 11, 2024
9b89968
w
TBS1996 Jan 11, 2024
400b6f5
feature flag
TBS1996 Jan 11, 2024
a6ca9fe
fix bugs
TBS1996 Jan 11, 2024
56b57ad
test
TBS1996 Jan 11, 2024
b8ee59a
reset slot
TBS1996 Jan 11, 2024
8a8f751
Merge branch 'master' into tor-dav1d/global-limit
TBS1996 Jan 11, 2024
a882565
reset slot
TBS1996 Jan 11, 2024
dd5dbb1
wip
TBS1996 Jan 11, 2024
7fd70c7
more documentation
TBS1996 Jan 11, 2024
6d1e5cf
cl
TBS1996 Jan 12, 2024
cd54b18
remove SlottedCounter, rename Counter to GlobalRateLimit
Dav1dde Jan 12, 2024
5e7f307
fix docs
Dav1dde Jan 12, 2024
a76e936
swap RwLock with a Mutex
Dav1dde Jan 12, 2024
e0f4cfb
another changelog fix
Dav1dde Jan 12, 2024
72a1c10
add tests, removed abstractions, default logic
Dav1dde Jan 12, 2024
d9170cc
review changes, additional test for no limit
Dav1dde Jan 12, 2024
a72d7c8
use random quota id instead of window
Dav1dde Jan 12, 2024
7d59170
docs fix
Dav1dde Jan 12, 2024
9ec555a
RedisQuota shouldnt be public
Dav1dde Jan 12, 2024
dbf0f98
adjust budget ratio to 0.001
Dav1dde Jan 15, 2024
5ba7894
separate key for global quotas
Dav1dde Jan 15, 2024
cf4196f
another test, rework integration test
Dav1dde Jan 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Changelog

## Unreleased
## 23.12.1

Check failure on line 3 in CHANGELOG.md

View workflow job for this annotation

GitHub Actions / Changelogs

Missing changelog entry.

Please consider adding a changelog entry for the next release.
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved

**Features**:

- Add a global throughput rate limiter for metric buckets. ([#2854](https://github.com/getsentry/relay/pull/2854))
- Group db spans with repeating logical conditions together. ([#2929](https://github.com/getsentry/relay/pull/2929))

**Bug Fixes**:
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions relay-quotas/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ default = []
redis = ["dep:thiserror", "dep:relay-log", "relay-redis/impl"]

[dependencies]
hashbrown = { workspace = true }
relay-base-schema = { path = "../relay-base-schema" }
relay-common = { path = "../relay-common" }
relay-log = { path = "../relay-log", optional = true }
Expand Down
256 changes: 256 additions & 0 deletions relay-quotas/src/global.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
use std::cmp::Ordering;
use std::sync::{Arc, Mutex, OnceLock, RwLock};

use relay_common::time::UnixTimestamp;
use relay_redis::redis::Script;
use relay_redis::{PooledClient, RedisError};

use crate::RedisQuota;

fn load_global_lua_script() -> &'static Script {
static SCRIPT: OnceLock<Script> = OnceLock::new();
SCRIPT.get_or_init(|| Script::new(include_str!("global_quota.lua")))
}

fn current_slot(window: u64) -> usize {
UnixTimestamp::now()
.as_secs()
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
.checked_div(window)
.unwrap_or_default() as usize
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
}

/// Counters used as a cache for global quotas.
///
/// When we want to ratelimit across all relay-instances, we need to use redis to synchronize.
/// Calling Redis every time we want to check if an item should be ratelimited would be very expensive,
/// which is why we have this cache. It works by 'taking' a certain budget from redis, by pre-incrementing
/// a global counter. We Put the amount we pre-incremented into this local cache and count down until
/// we have no more budget, then we ask for more from redis. If we find the global counter is above
/// the quota limit, we will ratelimit the item.
#[derive(Default)]
pub struct GlobalRateLimits {
counters: RwLock<hashbrown::HashMap<BudgetKey, Arc<Mutex<SlottedCounter>>>>,
}

impl GlobalRateLimits {
/// Returns `true` if the global quota should be ratelimited.
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
///
/// Certain errors can be resolved by syncing to redis, so in those cases
/// we try again to decrement the budget after syncing.
pub fn is_rate_limited(
&self,
client: &mut PooledClient,
quota: &RedisQuota,
quantity: usize,
) -> Result<bool, RedisError> {
let Some(limit) = quota.limit else {
return Ok(false);
};

let key = BudgetKeyRef::new(quota);

let opt_val = self.counters.read().unwrap().get(&key).cloned();
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
let val = match opt_val {
Some(val) => val,
None => Arc::clone(self.counters.write().unwrap().entry_ref(&key).or_default()),
};

let mut lock = val.lock().unwrap();
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
lock.is_rate_limited(client, quota, quantity, limit)
}
}

/// Key for storing global quota-budgets locally.
///
/// Note: must not be used in redis. For that, use RedisQuota.key().
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct BudgetKey {
prefix: String,
window: u64,
}

impl From<&BudgetKeyRef<'_>> for BudgetKey {
fn from(value: &BudgetKeyRef<'_>) -> Self {
BudgetKey {
prefix: value.prefix.to_owned(),
window: value.window,
}
}
}

/// Used to look up a hashmap of [`BudgetKey`]-keys without a string allocation.
///
/// This works due to the 'Equivalent' trait in the hashbrown crate.
#[derive(Clone, Copy, Hash)]
struct BudgetKeyRef<'a> {
prefix: &'a str,
window: u64,
}

impl<'a> BudgetKeyRef<'a> {
fn new(quota: &'a RedisQuota) -> Self {
Self {
prefix: quota.prefix(),
window: quota.window(),
}
}
}

impl hashbrown::Equivalent<BudgetKey> for BudgetKeyRef<'_> {
fn equivalent(&self, key: &BudgetKey) -> bool {
self.prefix == key.prefix && self.window == key.window
}
}

struct SlottedCounter {
slot: usize,
counter: Counter,
}

impl SlottedCounter {
pub fn new() -> Self {
Self {
slot: 0,
counter: Counter::new(),
}
}

pub fn is_rate_limited(
&mut self,
client: &mut PooledClient,
quota: &RedisQuota,
quantity: usize,
limit: u64,
) -> Result<bool, RedisError> {
let current_slot = current_slot(quota.window());

match self.slot.cmp(&current_slot) {
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
Ordering::Greater => {
// TODO double check logic here
// in theory this should never happen only if someone messes with the system time
// be safe and dont rate limit
relay_log::error!("budget slot ahead of current slot");
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
Ok(false)
}
Ordering::Less => {
self.counter = Counter::new();
self.slot = current_slot;

self.counter.is_rate_limited(client, limit, quantity, quota)
}
Ordering::Equal => self.counter.is_rate_limited(client, limit, quantity, quota),
}
}
}

impl Default for SlottedCounter {
fn default() -> Self {
Self::new()
}
}

/// Represents the local budget taken from a global quota.
struct Counter {
local_counter: LocalCounter,
redis_counter: RedisCounter,
}

impl Counter {
pub fn new() -> Self {
Self {
local_counter: LocalCounter::new(),
redis_counter: RedisCounter::new(),
}
}

pub fn is_rate_limited(
&mut self,
client: &mut PooledClient,
limit: u64,
quantity: usize,
quota: &RedisQuota,
) -> Result<bool, RedisError> {
if self.local_counter.try_consume(quantity) {
return Ok(false);
}

if !self.redis_counter.can_satisfy(quantity, limit) {
return Ok(true);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we flip this around and call the method exceeds_limit?


let budget_to_reserve = quantity.max(self.default_request_size_based_on_limit());
let reserved = self
.redis_counter
.try_reserve(client, budget_to_reserve, limit, quota)? as usize;

self.local_counter.increase_budget(reserved);
Ok(!self.local_counter.try_consume(quantity))
}

fn default_request_size_based_on_limit(&self) -> usize {
100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to actually implement this, use a percentage we grab from the config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented with a global constant for now.

Might in the future want to use some kind of moving average instead.

}
}

struct LocalCounter {
budget: usize,
}

impl LocalCounter {
fn new() -> Self {
Self { budget: 0 }
}

fn try_consume(&mut self, quantity: usize) -> bool {
if self.budget >= quantity {
self.budget -= quantity;
true
} else {
false
}
}

fn increase_budget(&mut self, quantity: usize) {
self.budget = self.budget.saturating_add(quantity);
}
}

struct RedisCounter {
last_seen: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be just me but I associate last_seen with a timestamp.

Suggested change
last_seen: u64,
latest: u64,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer last_seen because it makes it clear the value may be out of date and not in sync with Redis.

}

impl RedisCounter {
fn new() -> Self {
Self { last_seen: 0 }
}

fn can_satisfy(&self, quantity: usize, limit: u64) -> bool {
self.last_seen + quantity as u64 <= limit
}

fn try_reserve(
&mut self,
client: &mut PooledClient,
quantity: usize,
limit: u64,
quota: &RedisQuota,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: quota determines the redis key, which comes before redis args, so i would put the parameter between client and quantity.

) -> Result<u64, RedisError> {
let script = load_global_lua_script();

let redis_key = quota.key();
let expiry = UnixTimestamp::now().as_secs() + quota.window();
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved

let (budget, redis_count): (u64, u64) = script
.prepare_invoke()
.key(redis_key.as_str())
.arg(limit)
.arg(expiry)
.arg(quantity)
.invoke(&mut client.connection()?)
.map_err(RedisError::Redis)?;

self.last_seen = redis_count;

Ok(budget)
}
}
66 changes: 66 additions & 0 deletions relay-quotas/src/global_quota.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
-- Global quota system.
--
--
-- Input:
--
-- ``KEY``:
-- * [string] Key of the counter.
--
-- ``ARGV``:
-- * [number] Quota limit. will not go over this limit while taking budget.
-- * [number] Absolute Expiration time as Unix timestamp (secs since 1.1.1970 ) for the key.
-- * [number] Quantity we want to take. Limited by the Quota limit.
--
-- Output:
--
-- A two-element table containing the following:
--
-- 1. Reserved Budget (number):
-- - The amount of budget that has been successfully allocated based on the request.
-- - This value is 0 if the current quota limit has already been reached or exceeded.
-- - Otherwise, it reflects the actual budget allocated, which may be less than or equal to the requested budget,
-- depending on the available quota (headroom).
--
-- 2. Redis count (number):
-- - Represents the value of the counter after we have allocated our budget.
--
--
-- Made to work with a local cache of quota limit. The caller will "take" a certain budget
-- from the global redis counter, given that the limit have not been exceeded. This is to dramatically
-- reduce the amount of redis calls while at the same time not overshooting the quota.
-- We return both the size of the taken budget and the size of the redis count.
-- The reason we return the size of the redis count is to avoid asking for more budget when
-- the previously seen redis count is still bigger than the limit.
--
--
-- The redis keys are unique to their timeslot, which is why we let them expire in order to not
-- fill up redis with dead keys.


-- The key to the global quota.
local key = KEYS[1]
-- The max amount that we want to take within the given slot. We won't take a budget if
-- the count is higher than the limit.
local limit = tonumber(ARGV[1])
-- When the redis key/val should be deleted.
local expiry = tonumber(ARGV[2])
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
-- The budget that the caller intends to take. Will be capped if too close to the limit.
local requested_budget = tonumber(ARGV[3])

local redis_count = tonumber(redis.call('GET', key) or 0)

if redis_count >= limit then
return {0, redis_count}
else
-- Ensures the budget is not more than the quantity needed to hit the limit.
local headroom = limit - redis_count
local budget = math.min(headroom, requested_budget)
redis.call('INCRBY', key, budget)
if redis_count == 0 then
redis.call('EXPIREAT', key, expiry)
end

return {budget, redis_count + budget}
end


5 changes: 5 additions & 0 deletions relay-quotas/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,14 @@
/// typically happens for disabled keys, projects, or organizations.
const REJECT_ALL_SECS: u64 = 60;

#[cfg(feature = "redis")]
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
mod global;
mod quota;
mod rate_limit;

#[cfg(feature = "redis")]
pub use self::global::*;

pub use self::quota::*;
pub use self::rate_limit::*;

Expand Down
Loading
Loading