Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(txprocessing): Apply indexing and processing quotas separately #1471

Closed
wants to merge 20 commits into from

Conversation

flub
Copy link
Contributor

@flub flub commented Sep 14, 2022

change of mind

The billing teams have changed their mind and we're probably going to go with #1507 instead

description

This changes the rate limiting to treat indexing and processing quotas separately, respecting each one individually.

During the request handling CheckEnvelope is called which checks if any exhausted quotas are cached in the in-memory project config cache. Before this change if such an exhausted quota was cached the envelope would be rejected, now it treats transaction events and quotas specially:

  • DataCategory::Transaction is the indexing quota, this is treated as the primary quota.
  • DataCategory::TransactionProcessed is the processing quota which is used for extracting metrics.
  • If the event is a transaction the TransactionProcessed quota is also checked, this receives its own field in the Enforcement to communicate if this is active. For non-transaction events this new field on the Enforcement will never be active.
  • If only one of DataCategory::Transaction or DataCategory::TransactionProcessed processing exceeds quota:
    • The transaction event payload is not removed from the envelope.
    • The RateLimit is removed so that the request endpoint does not communicate rate limits to the client using 429 responses: it needs to continue receiving transactions until both quota are exhausted.

Finally if not all transaction quotas were exhausted the envelope is queued for processing. In order to communicate that only partial processing is required the Enforcement is placed in the envelope headers.

During processing the Enforcement is copied into the ProcessEnvelopeState. The individual processing steps can now check this in order to skip work which should not be performed. The second rate limiting check also uses this to not call the rate limiter again for the quota which was already exhausted earlier on. This improves consistency as the actual rate limiter may have slightly different results by now and also reduces overheads of calling the real rate limiter.

Dismissed ideas

An earlier version generalised that multiple DataCategories could apply to an event instead of a single DataCategory. This version ended up with the same number of special cases to handle the interactions between DataCategory::Transaction and DataCategory::TransactionProcessed. So while we only have a single instance of two DataCategories for an event it was deemed simpler to not make this generalisation yet and fully special-case these two quotas.

mainly this adds a new field in enforcement.  i may try a vec next

INGEST-1354
INGEST-1587
@github-actions
Copy link

github-actions bot commented Sep 14, 2022

Fails
🚫 Please consider adding a changelog entry for the next release.
Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

  1. Features: For new user-visible functionality.
  2. Bug Fixes: For user-visible bug fixes.
  3. Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Apply indexing and processing quotas separately. ([#1471](https://github.com/getsentry/relay/pull/1471))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against 4d6a903

@flub flub force-pushed the flub/billing-free-processing branch from e2dfe74 to 7fff3ea Compare September 14, 2022 13:56
Floris Bruynooghe added 2 commits September 15, 2022 14:50
This is an approach that generically allows multiple data categories
for events.  The resulting code is fairly reasonable.
@flub flub marked this pull request as ready for review September 19, 2022 09:05
@flub flub requested a review from a team September 19, 2022 09:05
/// matches multiple data categories, e.g. the case for events of
/// [`ItemType::Transaction`] which have both [`DataCategory::Transaction`] and
/// [`DataCategory::TransactionProcessed`] quota associated with them.
pub event_categories: DataCategories,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvelopeSummary::compute needs to account for previously applied rate limits by looking at rate_limited_categories from the item headers.

@@ -1995,6 +2031,7 @@ impl EnvelopeProcessorService {
}

if_processing!({
// TODO: should see it is already rate-limited and not use redis if so
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enfoce_quotas needs to remove the transaction event if index quota is exhausted so that serialize_event does not get to use it

@flub flub self-assigned this Sep 20, 2022
Floris Bruynooghe added 10 commits September 22, 2022 10:28
The envelope limiter needs to avoid calling redis again and needs to
remove items that need to be removed.  This does that but it is not
hooked up properly as the rate_limited_categories which are passed
through the processing are too simplistic and instead I want to pass
the enforcement through.  That still needs to be hooked up.
@flub flub changed the title feat(txprocessing): Do not count indexing quota for processing feat(txprocessing): Apply indexing and processing quotas separately Sep 27, 2022
@flub flub assigned jjbayer and iker-barriocanal and unassigned flub Sep 27, 2022
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically a question on emitting outcomes for processed transactions.

Comment on lines +1538 to +1542
if state.early_enforcement.event.is_active()
&& state.early_enforcement.event.category() == DataCategory::Transaction
{
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both processed and indexed transaction quotas are exhausted, the active enforcement here is for processed transactions and the rest of the method would run even if the transaction won't be indexed in the end. Is this correct? (It's an optimization we should not do for now, but I want to double-check I understand the code correctly.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will not run if the indexed transaction quota is exhausted. The enforcement for processed transaction quota is never checked here.

IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

If both processed and indexed transaction quotas are exhausted the event would never have been queued for processing and never make it here. I'm not sure if we should make that clearer here, and if so how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event? Also, we collect the EventTransactionSource statsd metric in here, which might get skewed if we skip it for non-indexed transactions.

If we keep it, I would definitely add a comment here explaining why we early return in this case.

Comment on lines 135 to +140
self.track_outcome(outcome.clone(), category, 1);
}

if self.summary.transaction_processing {
self.track_outcome(outcome.clone(), DataCategory::TransactionProcessed, 1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reject is called, there are two possibilities to generate outcomes:

  1. summary.transaction_processing is false, so a single outcome is generated for an indexed transaction.
  2. summary.transaction_processing is true, so two outcomes are generated: one for processed and one for indexed transactions.

The first case is correct. The second case, however, is not always correct: if transaction_processing is true, indexed transactions may or may not be limited. This situation also results in not generating outcomes only for processed transactions.

I'm not sure if I fully understand the code in the PR, so I may not be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right that this is not done correctly. This will need some investigation on how to do this right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I'm confused about how outcomes are generated, especially for processed transactions. This is related to this other comment on this PR.

Floris Bruynooghe and others added 3 commits September 27, 2022 13:35
Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
if state.early_enforcement.event.is_active()
&& state.early_enforcement.event.category() == DataCategory::Transaction
{
return Ok(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm much more confident that the early return makes sense: store_process_event runs after metrics extraction, and we should definitely skip it for non-indexed events.

Comment on lines +1538 to +1542
if state.early_enforcement.event.is_active()
&& state.early_enforcement.event.category() == DataCategory::Transaction
{
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this here is already an optimisation to only run this function if we are actually going to index the transaction. I admit I don't know why, but I was told this step isn't needed in that case.

The fact that this is hard to understand for all of us makes me wonder if we should remove this optimization. Especially becausefinalize_event runs before metrics extraction, which might rely on the clock drift correction happening in finalize_event? Also, we collect the EventTransactionSource statsd metric in here, which might get skewed if we skip it for non-indexed transactions.

If we keep it, I would definitely add a comment here explaining why we early return in this case.

/// Returns the internal early enforcement header.
///
/// See [`EnvelopeHeaders::early_enforcement`].
pub fn get_early_enforcement(&self) -> &Enforcement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
pub fn get_early_enforcement(&self) -> &Enforcement {
pub fn early_enforcement(&self) -> &Enforcement {

https://doc.rust-lang.org/1.0.0/style/style/naming/README.html#getter/setter-methods-[rfc-344]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

pub event_category: Option<DataCategory>,

/// Whether the event is a transaction and thus can have metrics extracted from it.
pub transaction_processing: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a method returning true if self.event_category is Transaction or TransactionProcessed?

envelope.retain_items(|item| self.retain_item(item, &enforcement));
let (enforcement, rate_limits) = self.execute(&summary, scoping, early_enforcement)?;
envelope.retain_items(|item| self.should_retain_item(item, &enforcement));
envelope.set_early_enforcement(enforcement.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit circular: enforce gets an argument early_enforcement and then sets early_enforcement on the envelope unconditionally. Should we instead call set_early_enforcement conditionally, i.e. only when the input early_enforcement is None/default?

rate_limits.merge(event_limits);
// Handle transactions specially, they have processing quota too.
if category == DataCategory::Transaction {
if early_enforcement.transaction_processed.is_active() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment here explaining that there is no need to evaluate limits again in this case?

@@ -1772,7 +1783,9 @@ impl EnvelopeProcessorService {
&self,
state: &mut ProcessEnvelopeState,
) -> Result<(), ProcessingError> {
if state.transaction_metrics_extracted {
if state.transaction_metrics_extracted
|| state.early_enforcement.transaction_processed.is_active()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: If transaction_processed is active, doesn't it also mean that there is no event on the state, and we skip extracting on line 1809 anyway?

Comment on lines +444 to +449
// If only one of the rate limits applied we omit both of them. If there is
// a rate limit the endpoint will return 429 but we need the client to keep
// sending transactions unless both quotas limits were exceeded.
if rate_limits.iter().count() == 1 {
rate_limits = RateLimits::new();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario where rate limits apply for indexed but not for processed transactions? So far I assumed that an active rate limit on processed transactions implies an active rate limit on indexed transactions.

assert!(!envelope.is_empty());
mock.assert_not_called(DataCategory::Transaction);
mock.assert_call(DataCategory::TransactionProcessed, Some(1));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question on tests: Do we need an integration test for any of this, or do we feel confident that the unit tests cover everything?

}

state.rate_limits = limits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being removed because we already did not use it on master, or is the removal related to the PR?

@jjbayer jjbayer removed their assignment Sep 27, 2022
@flub
Copy link
Contributor Author

flub commented Sep 28, 2022

Hum, I think there's a bug that we'll never send 429 if the project config does not yet have a DataCategory::TransactionProcessed at all.

@flub flub marked this pull request as draft October 3, 2022 12:30
@jan-auer jan-auer closed this Oct 27, 2022
@jan-auer jan-auer deleted the flub/billing-free-processing branch October 27, 2022 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants