feat(txprocessing): Apply indexing and processing quotas separately #1471

flub · 2022-09-14T13:55:53Z

change of mind

The billing teams have changed their mind and we're probably going to go with #1507 instead

description

This changes the rate limiting to treat indexing and processing quotas separately, respecting each one individually.

During the request handling CheckEnvelope is called which checks if any exhausted quotas are cached in the in-memory project config cache. Before this change if such an exhausted quota was cached the envelope would be rejected, now it treats transaction events and quotas specially:

DataCategory::Transaction is the indexing quota, this is treated as the primary quota.
DataCategory::TransactionProcessed is the processing quota which is used for extracting metrics.
If the event is a transaction the TransactionProcessed quota is also checked, this receives its own field in the Enforcement to communicate if this is active. For non-transaction events this new field on the Enforcement will never be active.
If only one of DataCategory::Transaction or DataCategory::TransactionProcessed processing exceeds quota:
- The transaction event payload is not removed from the envelope.
- The RateLimit is removed so that the request endpoint does not communicate rate limits to the client using 429 responses: it needs to continue receiving transactions until both quota are exhausted.

Finally if not all transaction quotas were exhausted the envelope is queued for processing. In order to communicate that only partial processing is required the Enforcement is placed in the envelope headers.

During processing the Enforcement is copied into the ProcessEnvelopeState. The individual processing steps can now check this in order to skip work which should not be performed. The second rate limiting check also uses this to not call the rate limiter again for the quota which was already exhausted earlier on. This improves consistency as the actual rate limiter may have slightly different results by now and also reduces overheads of calling the real rate limiter.

Dismissed ideas

An earlier version generalised that multiple DataCategories could apply to an event instead of a single DataCategory. This version ended up with the same number of special cases to handle the interactions between DataCategory::Transaction and DataCategory::TransactionProcessed. So while we only have a single instance of two DataCategories for an event it was deemed simpler to not make this generalisation yet and fully special-case these two quotas.

mainly this adds a new field in enforcement. i may try a vec next INGEST-1354 INGEST-1587

github-actions · 2022-09-14T13:56:27Z

	Fails
🚫	Please consider adding a changelog entry for the next release.

Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

Features: For new user-visible functionality.
Bug Fixes: For user-visible bug fixes.
Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Apply indexing and processing quotas separately. ([#1471](https://github.com/getsentry/relay/pull/1471))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against 4d6a903

This is an approach that generically allows multiple data categories for events. The resulting code is fairly reasonable.

relay-common/src/constants.rs

flub · 2022-09-19T12:14:44Z

relay-server/src/utils/rate_limits.rs

+    /// matches multiple data categories, e.g. the case for events of
+    /// [`ItemType::Transaction`] which have both [`DataCategory::Transaction`] and
+    /// [`DataCategory::TransactionProcessed`] quota associated with them.
+    pub event_categories: DataCategories,


EnvelopeSummary::compute needs to account for previously applied rate limits by looking at rate_limited_categories from the item headers.

flub · 2022-09-19T12:24:23Z

relay-server/src/actors/processor.rs

@@ -1995,6 +2031,7 @@ impl EnvelopeProcessorService {
        }

        if_processing!({
+            // TODO: should see it is already rate-limited and not use redis if so


enfoce_quotas needs to remove the transaction event if index quota is exhausted so that serialize_event does not get to use it

The envelope limiter needs to avoid calling redis again and needs to remove items that need to be removed. This does that but it is not hooked up properly as the rate_limited_categories which are passed through the processing are too simplistic and instead I want to pass the enforcement through. That still needs to be hooked up.

iker-barriocanal

Basically a question on emitting outcomes for processed transactions.

relay-server/src/utils/rate_limits.rs

iker-barriocanal · 2022-09-27T10:38:51Z

relay-server/src/actors/processor.rs

+        if state.early_enforcement.event.is_active()
+            && state.early_enforcement.event.category() == DataCategory::Transaction
+        {
+            return Ok(());
+        }


If both processed and indexed transaction quotas are exhausted, the active enforcement here is for processed transactions and the rest of the method would run even if the transaction won't be indexed in the end. Is this correct? (It's an optimization we should not do for now, but I want to double-check I understand the code correctly.)

This function will not run if the indexed transaction quota is exhausted. The enforcement for processed transaction quota is never checked here.

IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

If both processed and indexed transaction quotas are exhausted the event would never have been queued for processing and never make it here. I'm not sure if we should make that clearer here, and if so how?

IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event? Also, we collect the EventTransactionSource statsd metric in here, which might get skewed if we skip it for non-indexed transactions.

If we keep it, I would definitely add a comment here explaining why we early return in this case.

iker-barriocanal · 2022-09-27T10:58:38Z

relay-server/src/utils/envelope_context.rs

            self.track_outcome(outcome.clone(), category, 1);
        }

+        if self.summary.transaction_processing {
+            self.track_outcome(outcome.clone(), DataCategory::TransactionProcessed, 1);
+        }


When reject is called, there are two possibilities to generate outcomes:

summary.transaction_processing is false, so a single outcome is generated for an indexed transaction.

summary.transaction_processing is true, so two outcomes are generated: one for processed and one for indexed transactions.

The first case is correct. The second case, however, is not always correct: if transaction_processing is true, indexed transactions may or may not be limited. This situation also results in not generating outcomes only for processed transactions.

I'm not sure if I fully understand the code in the PR, so I may not be correct.

I think you are right that this is not done correctly. This will need some investigation on how to do this right.

Basically, I'm confused about how outcomes are generated, especially for processed transactions. This is related to this other comment on this PR.

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>

jjbayer · 2022-09-26T15:33:21Z

relay-server/src/actors/processor.rs

+        if state.early_enforcement.event.is_active()
+            && state.early_enforcement.event.category() == DataCategory::Transaction
+        {
+            return Ok(());


Here I'm much more confident that the early return makes sense: store_process_event runs after metrics extraction, and we should definitely skip it for non-indexed events.

jjbayer · 2022-09-27T12:22:14Z

relay-server/src/actors/processor.rs

+        if state.early_enforcement.event.is_active()
+            && state.early_enforcement.event.category() == DataCategory::Transaction
+        {
+            return Ok(());
+        }


IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event? Also, we collect the EventTransactionSource statsd metric in here, which might get skewed if we skip it for non-indexed transactions.

If we keep it, I would definitely add a comment here explaining why we early return in this case.

jjbayer · 2022-09-27T12:32:10Z

relay-server/src/envelope.rs

+    /// Returns the internal early enforcement header.
+    ///
+    /// See [`EnvelopeHeaders::early_enforcement`].
+    pub fn get_early_enforcement(&self) -> &Enforcement {


nit

Suggested change

pub fn get_early_enforcement(&self) -> &Enforcement {

pub fn early_enforcement(&self) -> &Enforcement {

https://doc.rust-lang.org/1.0.0/style/style/naming/README.html#getter/setter-methods-[rfc-344]

jjbayer · 2022-09-27T12:37:29Z

relay-server/src/utils/rate_limits.rs

    pub event_category: Option<DataCategory>,

+    /// Whether the event is a transaction and thus can have metrics extracted from it.
+    pub transaction_processing: bool,


Could this be a method returning true if self.event_category is Transaction or TransactionProcessed?

jjbayer · 2022-09-27T12:46:38Z

relay-server/src/utils/rate_limits.rs

-        envelope.retain_items(|item| self.retain_item(item, &enforcement));
+        let (enforcement, rate_limits) = self.execute(&summary, scoping, early_enforcement)?;
+        envelope.retain_items(|item| self.should_retain_item(item, &enforcement));
+        envelope.set_early_enforcement(enforcement.clone());


This feels a bit circular: enforce gets an argument early_enforcement and then sets early_enforcement on the envelope unconditionally. Should we instead call set_early_enforcement conditionally, i.e. only when the input early_enforcement is None/default?

jjbayer · 2022-09-27T13:37:06Z

relay-server/src/utils/rate_limits.rs

-            rate_limits.merge(event_limits);
+            // Handle transactions specially, they have processing quota too.
+            if category == DataCategory::Transaction {
+                if early_enforcement.transaction_processed.is_active() {


Should we add a comment here explaining that there is no need to evaluate limits again in this case?

jjbayer · 2022-09-27T14:01:39Z

relay-server/src/actors/processor.rs

@@ -1772,7 +1783,9 @@ impl EnvelopeProcessorService {
        &self,
        state: &mut ProcessEnvelopeState,
    ) -> Result<(), ProcessingError> {
-        if state.transaction_metrics_extracted {
+        if state.transaction_metrics_extracted
+            || state.early_enforcement.transaction_processed.is_active()


Out of curiosity: If transaction_processed is active, doesn't it also mean that there is no event on the state, and we skip extracting on line 1809 anyway?

jjbayer · 2022-09-27T14:04:27Z

relay-server/src/utils/rate_limits.rs

+                    // If only one of the rate limits applied we omit both of them.  If there is
+                    // a rate limit the endpoint will return 429 but we need the client to keep
+                    // sending transactions unless both quotas limits were exceeded.
+                    if rate_limits.iter().count() == 1 {
+                        rate_limits = RateLimits::new();
+                    }


Is there a scenario where rate limits apply for indexed but not for processed transactions? So far I assumed that an active rate limit on processed transactions implies an active rate limit on indexed transactions.

jjbayer · 2022-09-27T14:05:42Z

relay-server/src/utils/rate_limits.rs

+        assert!(!envelope.is_empty());
+        mock.assert_not_called(DataCategory::Transaction);
+        mock.assert_call(DataCategory::TransactionProcessed, Some(1));
+    }


General question on tests: Do we need an integration test for any of this, or do we feel confident that the unit tests cover everything?

jjbayer · 2022-09-27T14:12:16Z

relay-server/src/actors/processor.rs

        }

-        state.rate_limits = limits;


Is this being removed because we already did not use it on master, or is the removal related to the PR?

flub · 2022-09-28T08:52:52Z

Hum, I think there's a bug that we'll never send 429 if the project config does not yet have a DataCategory::TransactionProcessed at all.

first wip

7fff3ea

mainly this adds a new field in enforcement. i may try a vec next INGEST-1354 INGEST-1587

flub force-pushed the flub/billing-free-processing branch from e2dfe74 to 7fff3ea Compare September 14, 2022 13:56

Floris Bruynooghe added 2 commits September 15, 2022 14:50

More generically allow multiple datacategories for an event

987f5b2

This is an approach that generically allows multiple data categories for events. The resulting code is fairly reasonable.

clean up random comments

aac3e70

flub commented Sep 15, 2022

View reviewed changes

relay-common/src/constants.rs Outdated Show resolved Hide resolved

Floris Bruynooghe added 3 commits September 15, 2022 17:37

Fixup some stuff I missed because missing --all-features

ad38afe

Plumb through to the processing pipeline

201e955

Merge branch 'master' into flub/billing-free-processing

f582a28

flub marked this pull request as ready for review September 19, 2022 09:05

flub requested a review from a team September 19, 2022 09:05

flub commented Sep 19, 2022

View reviewed changes

flub self-assigned this Sep 20, 2022

Floris Bruynooghe added 10 commits September 22, 2022 10:28

switch back to single datacategory for an event

0571c72

clippy

f0ffb61

more cleanup

d8ebaa9

more cleanup

49ee815

Merge branch 'master' into flub/billing-free-processing

eb829e3

Hook up early enforcement to be passed around the processing

a60b668

cleanup

53954f1

clippy

e3789f0

typo

1758670

flub changed the title ~~feat(txprocessing): Do not count indexing quota for processing~~ feat(txprocessing): Apply indexing and processing quotas separately Sep 27, 2022

flub assigned jjbayer and iker-barriocanal and unassigned flub Sep 27, 2022

fix some test, add some tests

cfd5bfb

iker-barriocanal reviewed Sep 27, 2022

View reviewed changes

iker-barriocanal assigned flub and unassigned iker-barriocanal Sep 27, 2022

Floris Bruynooghe and others added 3 commits September 27, 2022 13:35

Merge branch 'master' into flub/billing-free-processing

c349d79

Apply suggestions from code review

d9365ad

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>

make clippy happy

4d6a903

jjbayer reviewed Sep 27, 2022

View reviewed changes

jjbayer removed their assignment Sep 27, 2022

flub marked this pull request as draft October 3, 2022 12:30

jan-auer closed this Oct 27, 2022

jan-auer deleted the flub/billing-free-processing branch October 27, 2022 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(txprocessing): Apply indexing and processing quotas separately #1471

feat(txprocessing): Apply indexing and processing quotas separately #1471

flub commented Sep 14, 2022 •

edited

Loading

github-actions bot commented Sep 14, 2022 •

edited

Loading

flub Sep 19, 2022

flub Sep 19, 2022

iker-barriocanal left a comment

iker-barriocanal Sep 27, 2022

flub Sep 27, 2022

jjbayer Sep 27, 2022

iker-barriocanal Sep 27, 2022

flub Sep 27, 2022

iker-barriocanal Sep 28, 2022

jjbayer Sep 26, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

iker-barriocanal Sep 28, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

jjbayer Sep 27, 2022

flub commented Sep 28, 2022

	pub fn get_early_enforcement(&self) -> &Enforcement {
	pub fn early_enforcement(&self) -> &Enforcement {

feat(txprocessing): Apply indexing and processing quotas separately #1471

feat(txprocessing): Apply indexing and processing quotas separately #1471

Conversation

flub commented Sep 14, 2022 • edited Loading

change of mind

description

Dismissed ideas

github-actions bot commented Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iker-barriocanal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flub commented Sep 28, 2022

flub commented Sep 14, 2022 •

edited

Loading

github-actions bot commented Sep 14, 2022 •

edited

Loading