-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(txprocessing): Apply indexing and processing quotas separately #1471
Conversation
mainly this adds a new field in enforcement. i may try a vec next INGEST-1354 INGEST-1587
Instructions and example for changelogFor changes exposed to the Python package, please add an entry to For changes to the Relay server, please add an entry to
To the changelog entry, please add a link to this PR (consider a more descriptive message): - Apply indexing and processing quotas separately. ([#1471](https://github.com/getsentry/relay/pull/1471)) If none of the above apply, you can opt out by adding #skip-changelog to the PR description. |
e2dfe74
to
7fff3ea
Compare
This is an approach that generically allows multiple data categories for events. The resulting code is fairly reasonable.
/// matches multiple data categories, e.g. the case for events of | ||
/// [`ItemType::Transaction`] which have both [`DataCategory::Transaction`] and | ||
/// [`DataCategory::TransactionProcessed`] quota associated with them. | ||
pub event_categories: DataCategories, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnvelopeSummary::compute
needs to account for previously applied rate limits by looking at rate_limited_categories from the item headers.
@@ -1995,6 +2031,7 @@ impl EnvelopeProcessorService { | |||
} | |||
|
|||
if_processing!({ | |||
// TODO: should see it is already rate-limited and not use redis if so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enfoce_quotas needs to remove the transaction event if index quota is exhausted so that serialize_event does not get to use it
The envelope limiter needs to avoid calling redis again and needs to remove items that need to be removed. This does that but it is not hooked up properly as the rate_limited_categories which are passed through the processing are too simplistic and instead I want to pass the enforcement through. That still needs to be hooked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically a question on emitting outcomes for processed transactions.
if state.early_enforcement.event.is_active() | ||
&& state.early_enforcement.event.category() == DataCategory::Transaction | ||
{ | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both processed and indexed transaction quotas are exhausted, the active enforcement here is for processed transactions and the rest of the method would run even if the transaction won't be indexed in the end. Is this correct? (It's an optimization we should not do for now, but I want to double-check I understand the code correctly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will not run if the indexed transaction quota is exhausted. The enforcement for processed transaction quota is never checked here.
IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.
If both processed and indexed transaction quotas are exhausted the event would never have been queued for processing and never make it here. I'm not sure if we should make that clearer here, and if so how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.
The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event
runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event
? Also, we collect the EventTransactionSource
statsd metric in here, which might get skewed if we skip it for non-indexed transactions.
If we keep it, I would definitely add a comment here explaining why we early return in this case.
self.track_outcome(outcome.clone(), category, 1); | ||
} | ||
|
||
if self.summary.transaction_processing { | ||
self.track_outcome(outcome.clone(), DataCategory::TransactionProcessed, 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reject
is called, there are two possibilities to generate outcomes:
summary.transaction_processing
is false, so a single outcome is generated for an indexed transaction.summary.transaction_processing
is true, so two outcomes are generated: one for processed and one for indexed transactions.
The first case is correct. The second case, however, is not always correct: if transaction_processing
is true, indexed transactions may or may not be limited. This situation also results in not generating outcomes only for processed transactions.
I'm not sure if I fully understand the code in the PR, so I may not be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that this is not done correctly. This will need some investigation on how to do this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, I'm confused about how outcomes are generated, especially for processed transactions. This is related to this other comment on this PR.
Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
if state.early_enforcement.event.is_active() | ||
&& state.early_enforcement.event.category() == DataCategory::Transaction | ||
{ | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm much more confident that the early return makes sense: store_process_event
runs after metrics extraction, and we should definitely skip it for non-indexed events.
if state.early_enforcement.event.is_active() | ||
&& state.early_enforcement.event.category() == DataCategory::Transaction | ||
{ | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.
The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event
runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event
? Also, we collect the EventTransactionSource
statsd metric in here, which might get skewed if we skip it for non-indexed transactions.
If we keep it, I would definitely add a comment here explaining why we early return in this case.
/// Returns the internal early enforcement header. | ||
/// | ||
/// See [`EnvelopeHeaders::early_enforcement`]. | ||
pub fn get_early_enforcement(&self) -> &Enforcement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
pub fn get_early_enforcement(&self) -> &Enforcement { | |
pub fn early_enforcement(&self) -> &Enforcement { |
https://doc.rust-lang.org/1.0.0/style/style/naming/README.html#getter/setter-methods-[rfc-344]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
pub event_category: Option<DataCategory>, | ||
|
||
/// Whether the event is a transaction and thus can have metrics extracted from it. | ||
pub transaction_processing: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a method returning true if self.event_category
is Transaction
or TransactionProcessed
?
envelope.retain_items(|item| self.retain_item(item, &enforcement)); | ||
let (enforcement, rate_limits) = self.execute(&summary, scoping, early_enforcement)?; | ||
envelope.retain_items(|item| self.should_retain_item(item, &enforcement)); | ||
envelope.set_early_enforcement(enforcement.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit circular: enforce
gets an argument early_enforcement
and then sets early_enforcement
on the envelope unconditionally. Should we instead call set_early_enforcement
conditionally, i.e. only when the input early_enforcement
is None/default?
rate_limits.merge(event_limits); | ||
// Handle transactions specially, they have processing quota too. | ||
if category == DataCategory::Transaction { | ||
if early_enforcement.transaction_processed.is_active() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment here explaining that there is no need to evaluate limits again in this case?
@@ -1772,7 +1783,9 @@ impl EnvelopeProcessorService { | |||
&self, | |||
state: &mut ProcessEnvelopeState, | |||
) -> Result<(), ProcessingError> { | |||
if state.transaction_metrics_extracted { | |||
if state.transaction_metrics_extracted | |||
|| state.early_enforcement.transaction_processed.is_active() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: If transaction_processed
is active, doesn't it also mean that there is no event
on the state
, and we skip extracting on line 1809 anyway?
// If only one of the rate limits applied we omit both of them. If there is | ||
// a rate limit the endpoint will return 429 but we need the client to keep | ||
// sending transactions unless both quotas limits were exceeded. | ||
if rate_limits.iter().count() == 1 { | ||
rate_limits = RateLimits::new(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario where rate limits apply for indexed but not for processed transactions? So far I assumed that an active rate limit on processed transactions implies an active rate limit on indexed transactions.
assert!(!envelope.is_empty()); | ||
mock.assert_not_called(DataCategory::Transaction); | ||
mock.assert_call(DataCategory::TransactionProcessed, Some(1)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question on tests: Do we need an integration test for any of this, or do we feel confident that the unit tests cover everything?
} | ||
|
||
state.rate_limits = limits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being removed because we already did not use it on master, or is the removal related to the PR?
Hum, I think there's a bug that we'll never send 429 if the project config does not yet have a DataCategory::TransactionProcessed at all. |
change of mind
The billing teams have changed their mind and we're probably going to go with #1507 instead
description
This changes the rate limiting to treat indexing and processing quotas separately, respecting each one individually.
During the request handling CheckEnvelope is called which checks if any exhausted quotas are cached in the in-memory project config cache. Before this change if such an exhausted quota was cached the envelope would be rejected, now it treats transaction events and quotas specially:
Finally if not all transaction quotas were exhausted the envelope is queued for processing. In order to communicate that only partial processing is required the Enforcement is placed in the envelope headers.
During processing the Enforcement is copied into the ProcessEnvelopeState. The individual processing steps can now check this in order to skip work which should not be performed. The second rate limiting check also uses this to not call the rate limiter again for the quota which was already exhausted earlier on. This improves consistency as the actual rate limiter may have slightly different results by now and also reduces overheads of calling the real rate limiter.
Dismissed ideas
An earlier version generalised that multiple DataCategories could apply to an event instead of a single DataCategory. This version ended up with the same number of special cases to handle the interactions between DataCategory::Transaction and DataCategory::TransactionProcessed. So while we only have a single instance of two DataCategories for an event it was deemed simpler to not make this generalisation yet and fully special-case these two quotas.