Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quotas): Require TransactionProcessed quota #1517

Closed
wants to merge 18 commits into from

Conversation

flub
Copy link
Contributor

@flub flub commented Oct 5, 2022

If metrics have not yet been extracted, a transaction event needs to
have DataCategory::TransactionProcessed quota available for ingestion.

Otherwise, we are in the quota enforcement part of the Relay event
processing pipeline and should check and count transaction indexing
quota. Transaction processing quota however is only counted in the
metrics part and not in event ingestion.

Background

Rate limiting and quota enforcement has to apply to legacy plans that
did not have processed transaction functionality, and to new plans
that introduce them.

Organization on Legacy Plan

There is a single quota for indexed transactions. We need to extract
metrics for indexed transactions so that there is backfilled history at
the time the customer switches to the new plan. This requires to extract
metrics for all accepted transactions.

Organization on New Plan

This plan has a dedicated quota for processed transactions. Indexing
transactions requires quota for processed transactions.

  • Validation and inbound data filters do not consume processing quota.
    There are no metrics extracted for such dropped events.
  • If there is processing quota available, extract and ingest metrics for
    this transaction. Otherwise, drop the transaction event and any
    transaction metrics bucket.
  • Dynamic sampling may drop (previously processed) transaction events.
    Extracted metrics and the processed transaction quotas are not
    impacted by this.
  • If there is indexing (and processing) quota available, store the
    sampled transaction event. Otherwise, drop the transaction event.

Algorithm

Terminology:

  • tx: Existing quota for indexed transactions.
  • txp: New quota for processed transactions.

Preconditions:

  • On legacy plans, the upstream sets txp := tx

Flow:

  1. CheckEnvelope in the fast-path
    1. If txp is limited or txp==0: drop; else: keep
  2. Filtering / Validation
    1. Potentially drop the event here
  3. Metrics extraction
  4. Dynamic Sampling
    1. legacy plans never have sampling rules, so skip sampling
    2. if sampling yields “drop”, go to step 6a
  5. Quota enforcement on the transaction event item
    1. Run tx += 1, txp += 0
    2. Drop event if either tx or txp are limited.
    3. Drop metrics if txp is limited.
  6. Store what is left
    1. Insert metrics into aggregator
    2. Place event into envelope and store

Dependencies

  • This requires TransactionProcessing quota to be set for all legacy
    plans.

Floris Bruynooghe added 2 commits October 5, 2022 15:29
Serde was (de)serialising this wrongly leading to inconsistencies.
@flub flub requested a review from a team October 5, 2022 15:53
If metrics have not yet been extracted a transaction event needs to
have DataCategory::TransactionProcessed quota available for ingestion.

Otherwise we are in the quota enforcement part of the relay event
processing pipeline and should check and count transaction indexing
quota.  Transaction processing quota however is only counted in the
metrics part and not in event ingestion.

INGEST-1581
INGEST-1653
@github-actions
Copy link

github-actions bot commented Oct 6, 2022

Fails
🚫 Please consider adding a changelog entry for the next release.
Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

  1. Features: For new user-visible functionality.
  2. Bug Fixes: For user-visible bug fixes.
  3. Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Require TransactionProcessed quota. ([#1517](https://github.com/getsentry/relay/pull/1517))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against 0fed2e4

@flub flub assigned iker-barriocanal and unassigned flub Oct 10, 2022
Floris Bruynooghe added 2 commits October 10, 2022 16:34
Even when metrics have already been extracted we want these two to be
consistent.
@flub
Copy link
Contributor Author

flub commented Oct 10, 2022

I believe this is ready, please review.

It does not yet have integration tests, I think for this it needs to be based on top of #1521 since it needs this to increment the used quotas of the metric buckets.

@iker-barriocanal iker-barriocanal removed their assignment Oct 12, 2022
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me. There are a few improvements to reduce the duplicated code, the test comment I added below, and making CI green. Otherwise, ready to merge IMO.

@jan-auer jan-auer changed the title feat(txquota): require TransactionProcessed quota feat(quotas): Require TransactionProcessed quota Oct 13, 2022
Floris Bruynooghe added 3 commits October 14, 2022 14:05
This is the best we can do without also having metrics bucket counting
happening.
jjbayer added a commit that referenced this pull request Oct 17, 2022
Apply rate limits to metrics buckets from the `transactions` namespace.

# Background 

Dynamic Sampling introduces a new type of quota,
`transactions_processed`, which defines how many transactions should be
"processed", i.e. whether or not we should extract metrics from them.
This quota is always >= the `transactions` quota, which defines how many
transactions should be stored & indexed.

The new data category should not only apply to incoming transaction
payloads (see #1517), but also to
metrics buckets, which may have been extracted by an upstream Relay.

Moreover, processing Relays must _count_ the number of transactions for
which metrics have been dropped, and create outcomes for them, in the
same way that we create accepted outcomes (see
getsentry/sentry#39236).

# Data Flow

Processing Relays check rate limits against Redis. Currently the only
actor that has access to this rate limiter is the `EnvelopeProcessor`,
and we want to make use of its thread pool to make the blocking calls to
Redis. This makes it necessary to redirect metrics buckets to the
processor before sending them to the `EnvelopeManager` for publishing:

## Old Flow (still applies under certain conditions)

```
 ┌──────────┐    ┌────────────┐ ┌───────────────┐
 │Aggregator│    │ProjectCache│ │EnvelopeManager│
 └────┬─────┘    └─────┬──────┘ └──────┬────────┘
      │                │               │
      │  FlushBuckets  │               │
      ├───────────────►│               │
      │                │  SendMetrics  │
      │                ├──────────────►│
      │                │               │
```

## New Flow

```
 ┌──────────┐  ┌────────────┐       ┌─────────────────┐ ┌─────┐ ┌───────────────┐
 │Aggregator│  │ProjectCache│       │EnvelopeProcessor│ │Redis│ │EnvelopeManager│
 └────┬─────┘  └─────┬──────┘       └────────┬────────┘ └──┬──┘ └──────┬────────┘
      │              │                       │             │           │
      │ FlushBuckets │                       │             │           │
      ├─────────────►│                       │             │           │
      │              │ RateLimitFlushBuckets │             │           │
      │              ├──────────────────────►│             │           │
      │              │                       ├────────────►│           │
      │              │                       │             │           │
      │              │                       │◄────────────┤           │
      │              │                       │             │           │
      │              │                       │             │           │
      │              │                       │       SendMetrics       │
      │              │                       ├─────────────┬──────────►│
      │              │                       │             │           │
```

# Business Logic

1. ProjectCache received a `FlushBuckets` from the metrics aggregator.
2. Count the number of transactions that contributed to these buckets
(see below).
2. Check if cached rate limits apply.
3. If so, drop transaction-related buckets, generate outcomes, and send
the buckets to `EnvelopeManager`.
4. Otherwise, if processing mode is enabled, send buckets to the
`EnvelopeProcessor`.
5. The processor communicates the transaction count to redis and applies
the rate limit if reached.
6. The processor sends a message to the project cache to update the
cached rate limits.
7. Drop transaction-related buckets, generate outcomes, and send the
buckets to `EnvelopeManager`.

## Counting processed transactions

* For `d:transactions/duration@millisecond`, increment the counter in
redis by the length of the bucket's value. This is an accurate count of
the number of transactions that contributed to the bucket.
* For any other metric, do not increment the counter in redis, but still
enforce the rate limit if the quota is currently exceeded.

# Not in this PR
* Enforcement of rate limits on the fast path.
* Enforcements of rate limits on metrics buckets in the `sessions`
namespace.

Co-authored-by: Jan Michael Auer <mail@jauer.org>
@jan-auer jan-auer closed this Oct 27, 2022
@jan-auer jan-auer deleted the flub/billing-tx-quota branch October 27, 2022 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants