feat(sampling): Reservoir sampling #2550

TBS1996 · 2023-09-27T13:44:51Z

Relay implementation of the reservoir project: getsentry/sentry#54449

Reservoir bias uses a type of SamplingRule which will sample all matches until a certain limit has been reached. This limit is tracked both locally on each relay, and with a global synchronized one in redis that procesisng relays can have access to. The redis counter will update the local counter if it's available.

The counters are saved on the Project struct, with a Mutex<BTreeMap<Ruleid, i64>> .
When we send an envelope for processing we send its corresponding project counters in the ProcessEnvelopeState to the EnvelopeProcessorService.

There, in the dynamic-sampling crate, we introduce a ReservoirEvaluator, which will, when a reservoir rule is matching, check if the rule has been reached or not by using the local counters we sent or if applicable the redis global count. The ReservoirEvaluator also takes care of updating both redis and the local counter.

After the limit is reached, the rule is no longer valid and will be ignored, so that the normal SampleRate and Factor variant of SamplingValue will apply.

Sentry is responsible for removing the reservoir rule from the SamplingConfig when it has reached its limit.

Whenever we receive a new ProjectConfig, we remove all the reservoir counters from its project that are no longer in the DynamicSamplingConfig.

regarding the use of mutex:
We use try_lock to avoid getting blocked in case the mutex is already in use. There's two reasons it might be blocked.

Another thread is handling a reservoir rule from the same project at the same time.
We are in the process of removing counters from that same project.

relay-kafka/src/lib.rs

TBS1996 · 2023-09-28T05:51:59Z

relay-sampling/src/config.rs

+        let sampling_base_value = match self.sampling_value {
+            SamplingValue::SampleRate { value } => value,
+            SamplingValue::Factor { value } => value,
+            SamplingValue::Reservoir { limit } => {


ugly: we return SamplingValue::Reservoir{limit}, even though the limit no longer has any meaning passed this point. It was the easiest solution, will refactor in the future.

TBS1996 · 2023-09-28T05:53:16Z

relay-sampling/src/config.rs

-}
-
-impl SamplingValue {
-    pub(crate) fn value(&self) -> f64 {


function no longer makes sense because reservoir limit doesn't have an analogous value to these

TBS1996 · 2023-09-28T08:11:59Z

relay-sampling/src/config.rs

+    }
+
+    /// Returns the updated [`SamplingValue`] if it's valid.
+    pub fn evaluate(


renamed the sample_rate function.

TBS1996 · 2023-09-28T08:14:20Z

relay-sampling/src/evaluation.rs

+    #[cfg(feature = "redis")]
+    redis_pool: Option<Arc<RedisPool>>,
+    #[cfg(feature = "redis")]
+    org_id: Option<u64>,


Ideally this would be one option, since we only care about having both or none. But a new type is overkill and a tuple might be messy.

We always have access to the org id but I opted for putting it behind an option rather than initializing with an invalid value, even though it wouldn't matter for the code execution

relay-sampling/src/evaluation.rs

TBS1996 · 2023-09-28T08:20:09Z

relay-sampling/src/evaluation.rs

@@ -93,6 +207,9 @@ impl SamplingEvaluator {
                        self.rule_ids,
                    ));
                }
+                SamplingValue::Reservoir { .. } => {
+                    return ControlFlow::Break(SamplingMatch::new(1.0, seed, vec![rule.id]));


we only take the last rule id in the reservoir case because the reservoir matching overrides all previously matched rules.

TBS1996 · 2023-09-28T08:22:34Z

relay-server/src/actors/processor.rs

@@ -565,6 +573,8 @@ impl EnvelopeProcessorService {
        });

        let inner = InnerProcessor {
+            #[cfg(feature = "processing")]
+            redis_pool: _redis.clone().map(Arc::new),
            #[cfg(feature = "processing")]
            rate_limiter: _redis
                .map(|pool| RedisRateLimiter::new(pool).max_limit(config.max_rate_limit())),


thoughts on sending the RedisRateLimiter the same arc as in the redis_pool field in a follow-up pr?

relay-server/src/actors/project.rs

relay-server/src/actors/project_cache.rs

CHANGELOG.md

relay-sampling/Cargo.toml

relay-sampling/src/config.rs

relay-sampling/src/evaluation.rs

relay-server/src/actors/processor.rs

relay-sampling/src/evaluation.rs

relay-server/src/actors/project_cache.rs

jan-auer · 2023-09-28T09:47:03Z

relay-server/src/actors/project.rs

+            return;
+        };
+
+        if let Ok(mut guard) = self.reservoir_counters.try_lock() {


Using try_lock comes with the downside of skipping cleanup in case the rules are being used right now. It is an unlikely case, so it's a fair assumption, but please leave a code comment of why this is so crucial to prevent that someone changes this into a full lock() in the future.

The only way to lock this here is using a tokio Mutex, but before we do that we should explore other approaches.

relay-sampling/src/evaluation.rs

TBS1996 · 2023-09-28T13:24:13Z

relay-sampling/src/evaluation.rs

+            if redis_sampling::set_redis_expiry(&mut redis_connection, &key, rule_expiry).is_err() {
+                relay_log::error!("failed to set redis reservoir rule expiry");
+            }


@jan-auer I wasn't sure what we would do if this returned an error, so I opted for just setting the expiry every time as we mentioned, then we could think about optimizing later.

We could log and ignore the error on EXPIRE like you do here, and handle the increment error gracefully. The more correct version, however, would be: Encapsulate the two calls into a single function that throws an error, and if the error occurs simply do not match the rule.

That would mean if Redis is unavailable we skip all reservoir rules and apply other matching rules. A graceful and predictable fallback behavior that does not cause excess indexing.

ok, theres an edge case if incrementing works but setting redis expiry doesn't in that case, but that should be very unlikely. If that happens htough, we'll count up without sampling.

TBS1996 · 2023-09-28T13:32:38Z

relay-sampling/src/evaluation.rs

+    }
+
+    /// Evaluates a reservoir rule, returning `true` if it should be sampled.
+    pub fn evaluate(&self, rule: RuleId, limit: i64, rule_expiry: Option<&DateTime<Utc>>) -> bool {


This function should satisfy all the performance optimizations mentioned.

it gets the local count and increments it if limit hasn't been reached.

fast return if its above the limit.

if redis isnt configured, we return early

if redis is configured but the received value is less than local value, we avoid locking again

only if we receive from redis, and it is higher, do we update the count by locking the mutex again

TBS1996 · 2023-09-28T13:37:21Z

relay-sampling/src/evaluation.rs

+    }
+
+    /// Gets the local count of a reservoir rule. Increments the count if limit isnt reached.
+    fn local_count(&self, rule: RuleId, limit: i64) -> Option<i64> {


not sure whats worse, returning an option when it should be a result, or havingLockResult<MutexGuard<'_, BTreeMap<RuleId, i64>>> in the signature

relay-sampling/src/evaluation.rs

jan-auer · 2023-09-28T14:08:59Z

relay-sampling/src/evaluation.rs

+            if redis_sampling::set_redis_expiry(&mut redis_connection, &key, rule_expiry).is_err() {
+                relay_log::error!("failed to set redis reservoir rule expiry");
+            }


We could log and ignore the error on EXPIRE like you do here, and handle the increment error gracefully. The more correct version, however, would be: Encapsulate the two calls into a single function that throws an error, and if the error occurs simply do not match the rule.

That would mean if Redis is unavailable we skip all reservoir rules and apply other matching rules. A graceful and predictable fallback behavior that does not cause excess indexing.

relay-sampling/src/evaluation.rs

relay-server/src/actors/processor.rs

relay-sampling/src/config.rs

A follow-up to #2550. The main motivation for this PR is that the `SamplingRule::Evaluate` method became pretty ugly because it relied on that the two variants of `SamplingValue` both had a sample rate attached, which it no longer does with the introduction of the `Reservoir` variant. The function has mainly been replaced by `SamplingEvaluator::try_compute_sample_rate`. Whose job it is to validate the rule, and if it's valid, return a sample rate. Validating for the `SamplingRate` and `Factor` variant means that it's not out of bounds with the given time range (depending on the decaying function), and for the `Reservoir` variant it means that the limit has not been exceeded. It will return an optional `ControlFlow`, where `None` means the rule is invalid and should be skipped, `Break` is analogous to `SamplingValue::SampleRate` (but includes reservoir), and `Continue` is analogous to `SamplingValue::Factor`. Some other adjustments have also been made, this should be all the changes: * Created the `try_compute_sample_rate` method as described. * Moved the decaying function logic to its own method. * Moved the checking of time range constraints to the beginning of for loop, so we don't do the more expensive condition-matching first. * Updated tests Co-authored-by: Jan Michael Auer <mail@jauer.org>

Updating the python changelog to reflect the new samplingrule variant in #2550, and removal of dynamic sampling ABI in #2515 Co-authored-by: Joris Bayer <joris.bayer@sentry.io>

TBS1996 added 8 commits September 27, 2023 08:37

wip

ee595d3

wip

e03b982

wip

92ec68f

wip

48eacfc

wip

a8b1827

wip

aef9de8

wip

f7e01da

wip

9c2700a

TBS1996 commented Sep 27, 2023

View reviewed changes

relay-kafka/src/lib.rs Show resolved Hide resolved

TBS1996 added 4 commits September 27, 2023 21:16

wip

c869328

wip

1b2e136

wip

c594810

wip

2828452

TBS1996 commented Sep 28, 2023

View reviewed changes

TBS1996 added 3 commits September 28, 2023 09:06

wip

3e10fe9

wip

c8bb0f4

wip

195bc31

TBS1996 commented Sep 28, 2023

View reviewed changes

wip

cd1ce03

TBS1996 commented Sep 28, 2023

View reviewed changes

relay-sampling/src/evaluation.rs Outdated Show resolved Hide resolved

TBS1996 commented Sep 28, 2023

View reviewed changes

relay-server/src/actors/project.rs Show resolved Hide resolved

TBS1996 commented Sep 28, 2023

View reviewed changes

relay-server/src/actors/project_cache.rs Outdated Show resolved Hide resolved

TBS1996 commented Sep 28, 2023

View reviewed changes

relay-server/src/actors/project_cache.rs Show resolved Hide resolved

TBS1996 added 3 commits September 28, 2023 10:31

wip

cca10ce

wip

6dbb219

Merge branch 'master' into tor/bias

f37cae7

TBS1996 changed the title ~~feat(sampling): Reservoir sampling.~~ feat(sampling): Reservoir sampling Sep 28, 2023

wip

70fc0c8

jan-auer reviewed Sep 28, 2023

View reviewed changes

TBS1996 added 3 commits September 28, 2023 14:49

wip

5faa6c4

wip

f431058

wip

bdb0374

TBS1996 commented Sep 28, 2023

View reviewed changes

wip

7a480a8

TBS1996 commented Sep 28, 2023

View reviewed changes

TBS1996 requested a review from jan-auer September 28, 2023 13:42

TBS1996 added 2 commits September 28, 2023 15:45

wip

8a5cb82

Merge branch 'master' into tor/bias

851b72d

jan-auer reviewed Sep 28, 2023

View reviewed changes

TBS1996 added 7 commits September 28, 2023 19:00

wip

9ac7012

wip

ecb65da

add test

9fb1c3f

Merge branch 'master' into tor/bias

e444d8d

add test

9b0d8fd

add test

27ab412

add test

175a66e

TBS1996 commented Sep 28, 2023

View reviewed changes

relay-sampling/src/config.rs Show resolved Hide resolved

wip

763f8bc

jan-auer approved these changes Sep 29, 2023

View reviewed changes

TBS1996 merged commit 5f8335f into master Sep 29, 2023

TBS1996 deleted the tor/bias branch September 29, 2023 09:20

TBS1996 mentioned this pull request Sep 30, 2023

ref(sampling): Refactor sampling evaluator #2557

Merged

TBS1996 mentioned this pull request Oct 3, 2023

chore(py): Bump python changelog #2561

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sampling): Reservoir sampling #2550

feat(sampling): Reservoir sampling #2550

TBS1996 commented Sep 27, 2023 •

edited

Loading

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023 •

edited

Loading

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023

jan-auer Sep 28, 2023

TBS1996 Sep 28, 2023

jan-auer Sep 28, 2023

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023

TBS1996 Sep 28, 2023

jan-auer Sep 28, 2023

feat(sampling): Reservoir sampling #2550

feat(sampling): Reservoir sampling #2550

Conversation

TBS1996 commented Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TBS1996 Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TBS1996 commented Sep 27, 2023 •

edited

Loading

TBS1996 Sep 28, 2023 •

edited

Loading