ref(server): Localize outcome emission in EnvelopeContext #1406

jan-auer · 2022-08-10T15:26:03Z

This change makes EnvelopeContext responsible for tracking the lifetime of an envelope and emitting outcomes. In a follow-up, the goal is to remove the long-running handler future in EnvelopeManager, and instead pass envelopes through separate queues the ingestion stages. Most importantly, this requires to have a central place that is bound to the envelope and emits outcomes and metrics when envelopes are dropped for any reason.

EnvelopeContext

Before this change, EnvelopeContext was a data object containing all information to create outcomes. It is primarily used in three places:

The endpoint handler future, where it was shared across all future callbacks to emit outcomes when envelopes are dropped.
The envelope manager's handler future, in exactly the same way.
The envelope processor, where it was stored on the processing state and used to emit outcomes for inbound filtering and rate limiting.

The envelope context is now passed alongside the envelope through Relay and also moves through message handlers. This can ensure that outcomes are guaranteed when the envelope is dropped. It can be dropped in two ways:

Explicitly by calling accept() or reject(outcome).
Implicitly by dropping, for instance if sending a message fails unexpectely. This represents a bug and will record an "internal" outcome.

Calling accept() consumes the envelope context. Temporarily, reject() does not yet consume, which will be changed in a follow-up.

Control Flow

The flow of envelopes remains unchanged, with the envelope context now following the envelope through every step:

(in the endpoint)

Load the request and construct an envelope. Drop empty envelopes.
Send it to the project cache for the fast-path. This may drop the envelope for cached rate limits or disabled projects. Outcomes are emitted in the project cache. Updates the envelope context with project information if available.
Send it to the envelope manager for queueing. If queueing fails, an outcome is emitted by the envelope manager.
End the request.

(in envelope manager)

Split envelopes into event-related items, metrics items, and everything else. This will create duplicate envelope contexts, since from here on there are multiple independent envelopes that get queued.
Send the envelope through the project cache again, now ensuring an up-to-date project state. Same logic as in the endpoint.
Send the envelope to the processor. All rate limits and invalid transactions are logged directly by the processor. An updated envelope and envelope context are sent back to the Envelope Manager.
Submit the envelope to either Kafka, Upstream, or the internal capture map (only in capture mode). For simplicity, the envelope context remains in the waiting future outside.
Explicitly drop the envelope context either by accepting or rejecting based on the result of the previous step.

Follow-Ups

Introduce a semaphore to limit the number of envelopes in Relay. The guard can be held by the envelope context. This is currently handled by a counter (active_envelopes) in the EnvelopeManager.
Make capture mode explicit messages so they can be created from other services.
Combine the envelope context and envelope in a single carrier type. Keeping the two separate required fewer code changes initially, however the context and the envelope always need to be passed and modified together.
(optional) Introduce safer APIs to modify the envelope's contents and update the envelope context at the same time.

#skip-changelog

flub · 2022-08-10T16:31:20Z

relay-server/src/utils/envelope_context.rs

    ///
    /// This does not send outcomes for empty envelopes or request-only contexts.
-    pub fn send_outcomes(&self, outcome: Outcome) {
+    pub fn reject(&mut self, outcome: Outcome) {


what's the reason this takes &mut self instead of mut self? I assume this is also the reason you end up with needing the a bit awkward self.done flag? It's a bit unfortunate as it breaks the RAII convention.

Unfortunately, there are several places in the code base where we need to pass the envelope context through callbacks such as map_err() or hold it on a struct such as ProcessEnvelopeState. Right now, having this &mut is still more ergonomic and allows to keep most of that code the same for this PR.

There will be a follow-up PR that refactors these places as we'll start to tie envelopes and the context completely together. Hopefully, when we're done with the transition, this can consume self again.

Agree with @flub here, but with the if self.done in place, I think we can leave it until the next refactor.

jan-auer

Some pointers for reviewers. A lot of this information will go in the PR description, but please let me know if you would prefer some of these notes in code comments.

jan-auer · 2022-08-10T20:47:30Z

relay-server/src/actors/envelopes.rs

+            // The envelope has been split, so we need to fork the context.
+            envelope_context.update(&envelope);
+            let event_context = EnvelopeContext::from_envelope(&event_envelope);


This construct is a bit ugly. From one envelope we create two, and thus we also need to split the EnvelopeContext. This is achieved by creating a new one for the split Envelope, and updating the old one with remaining items.

I'm thinking that we might want to lift the split method -- an odd function to begin with -- to the EnvelopeContext and maintain all modification to the envelope there. I haven't found a nice way to do that just yet, so I'm leaving this for a follow-up PR.

Agreed, envelope_context.split_by would be nice.

agree that moving split into the context here is probably the right way, but fine as a followup.

jan-auer · 2022-08-10T20:49:53Z

relay-server/src/actors/envelopes.rs

+        if envelope.is_empty() {
+            // The envelope can be empty here if it contained only metrics items which were removed
+            // above. In this case, the envelope was accepted and needs no further queueing.
+            envelope_context.accept();


This accept is actually correct. We only call this message if the envelope was non-empty. If the envelope is now empty, it means that all the items have been handled individually, and the envelope is done.

This mostly matters because accept() internally logs some success metrics.

The only case that can trigger this at the moment is a metrics-only envelope.

jan-auer · 2022-08-10T20:51:47Z

relay-server/src/actors/envelopes.rs

        } = message;

-        let sampling_project_key = utils::get_sampling_key(&envelope);
-
+        let start_time = envelope.meta().start_time();


To clean up the messages a bit already, the start time has been removed from HandleEnvelope and can instead be pulled from the Envelope's meta. This is is also where handle_store_like_request obtains it from.

jan-auer · 2022-08-10T20:52:48Z

relay-server/src/actors/envelopes.rs

-                CheckEnvelope::fetched(project_key, envelope),
-                *envelope_context.clone().borrow(),
-            )
+            .send(CheckEnvelope::fetched(


send_tracked is gone completely. If the envelope and its context is dropped for any reason, we log an "internal" outcome automatically, now. This will also cover more cases that were not instrumented yet.

Same applies to a couple of map_err calls below that intercepted project errors and mapped them to "internal".

jan-auer · 2022-08-10T20:53:54Z

relay-server/src/actors/envelopes.rs

-
-                match checked.envelope {
-                    Some(envelope) => {
-                        envelope_context.update(&envelope);


Updating scoping and the context is now handled inside the CheckEnvelope and ProcessEnvelope messages. The only thing that remains in the EnvelopeManager is to propagate errors for control flow.

jan-auer · 2022-08-10T21:04:07Z

relay-server/src/actors/processor.rs

-                },
-            })
-            .unwrap();
+        let new_envelope = relay_test::with_system(move || {


Tests are largely unaltered and simply updated to match the new signatures. Only exception is this test, which now requires an actix system since it will log outcomes internally. The outcomes do not matter for this test.

jan-auer · 2022-08-10T21:06:14Z

relay-server/src/endpoints/common.rs

@@ -344,85 +299,65 @@ where
    let project_key = meta.public_key();
    let start_time = meta.start_time();
    let config = request.state().config();
-
-    let envelope_context = Rc::new(RefCell::new(EnvelopeContext::from_request(&meta)));
+    let event_id = Rc::new(RefCell::new(None));


The EnvelopeContext is now moved through the chain of futures rather than shared across all the callbacks. However, we still need the EventId in the global error handler, so we can't get rid of the RefCell just yet.

jan-auer · 2022-08-10T21:07:10Z

relay-server/src/endpoints/common.rs

            if envelope.is_empty() {
-                // envelope is empty, cannot send outcomes
+                envelope_context.reject(Outcome::Invalid(DiscardReason::EmptyEnvelope));


Most of the rejections have been pushed deep into message handlers, this one being a notable exception. Since the envelope is actually dropped here, we also need to reject and drop the context.

jan-auer · 2022-08-10T21:11:26Z

relay-server/src/endpoints/common.rs

-                QueueEnvelopeError::TooManyEnvelopes => Outcome::Invalid(DiscardReason::Internal),
-            },
-            BadStoreRequest::ProjectFailed(project_error) => match project_error {
-                ProjectError::FetchFailed => Outcome::Invalid(DiscardReason::ProjectState),


Interestingly, this was mapped to DiscardReason::ProjectState while it was mapped to DiscardReason::Internal in the envelope manager's future. There's little point in differentiating between those failure modes in outcomes, so I went with "internal" just like everywhere else, allowing to simplify logic.

jan-auer · 2022-08-10T21:14:18Z

relay-server/src/utils/envelope_context.rs

+    /// Resets inner state to ensure there's no more logging.
+    fn finish(&mut self, counter: RelayCounters) {
+        relay_statsd::metric!(counter(counter) += 1);
+        relay_statsd::metric!(timer(RelayTimers::EnvelopeTotalTime) = self.start_time.elapsed());


Note that these metrics are now also logged for envelopes that are rejected in the endpoint (handle_store_like_request). This is actually more truthful.

jjbayer · 2022-08-11T08:03:54Z

relay-server/src/actors/envelopes.rs

+            // The envelope has been split, so we need to fork the context.
+            envelope_context.update(&envelope);
+            let event_context = EnvelopeContext::from_envelope(&event_envelope);


Agreed, envelope_context.split_by would be nice.

jjbayer · 2022-08-11T08:29:19Z

relay-server/src/actors/processor.rs

+                        state.envelope_context.update(&state.envelope);
+
+                        let envelope_response = if state.envelope.is_empty() {
+                            // Individual rate limits have already been issued
+                            state.envelope_context.reject(Outcome::RateLimited(None));
+                            None
+                        } else {
+                            Some((state.envelope, state.envelope_context))
+                        };


nit: If the intention is to reject without emitting an outcome, should we introduce an explicit argument / method for that? E.g.

.reject(None) // or .reject_empty()

jjbayer · 2022-08-11T08:37:36Z

relay-server/src/actors/project.rs

-        enforcement.track_outcomes(&envelope, scoping);
+        let scoping = envelope_context.scoping();
+        let (enforcement, rate_limits) = envelope_limiter.enforce(&mut envelope, &scoping)?;
+        enforcement.track_outcomes(&envelope, &scoping);


Will we move this to reject as well at some point?

Yes, or providing a safer reject_item method that removes an item, logs the outcome, and updates the context in the same go. That could then be used by the rate limiter.

jjbayer · 2022-08-11T08:57:55Z

relay-server/src/utils/envelope_context.rs

    ///
    /// This does not send outcomes for empty envelopes or request-only contexts.
-    pub fn send_outcomes(&self, outcome: Outcome) {
+    pub fn reject(&mut self, outcome: Outcome) {


Agree with @flub here, but with the if self.done in place, I think we can leave it until the next refactor.

jjbayer · 2022-08-11T09:02:26Z

relay-server/src/utils/envelope_context.rs

+    ///
+    /// This envelope context should be updated using [`update`](Self::update) soon after this
+    /// operation to ensure that subsequent outcomes are consistent.
+    pub fn track_outcome(&self, outcome: Outcome, category: DataCategory, quantity: usize) {


I wonder if we could get rid of the pub here? As far as I can see process_profiles is the only caller, I guess we need it there because we emit outcomes for individual profiles?

We will actually have to call this in more places, at least:

For sessions. Right now, we're dropping sessions without logging outcomes.

For the rate limiter once enforcement.track_outcomes is migrated to the context.

This is definitely a somewhat unsafe API in the interim.

flub · 2022-08-11T08:59:17Z

relay-server/src/actors/envelopes.rs

+            // The envelope has been split, so we need to fork the context.
+            envelope_context.update(&envelope);
+            let event_context = EnvelopeContext::from_envelope(&event_envelope);


agree that moving split into the context here is probably the right way, but fine as a followup.

flub · 2022-08-11T09:23:45Z

relay-server/src/actors/envelopes.rs

                            let outcome = Outcome::Invalid(DiscardReason::Internal);

-                            match error {
+                            Err(match error {


totally a nitpicking: but I dislike round brackets spanning this many lines, I'd always assign this and then put Err() around the variable on a next line.

Makes sense. This entire match statement should probably move to a function, or could be simplified by changing the error type to contain the SendError.

Since this is going to move soon anyway, I'll play with the best version of this in the follow-up.

relay-server/src/actors/processor.rs

flub · 2022-08-11T09:55:46Z

relay-server/src/utils/envelope_context.rs

+    /// Returns the instant at which the envelope was received at this Relay.
+    ///
+    /// This is the monotonic time equivalent to [`received_at`](Self::received_at).
+    pub fn start_time(&self) -> Instant {


naming is hard, but maybe hungarian naming is defensible in this case: .received_instant()? currently these two names are just so far apart.

Agreed. So far the entire code base consistently calls them start_time and received_at because of historical context. If it's OK, I'd do a sweep in a dedicated PR for this.

flub · 2022-08-11T09:58:01Z

Combine the envelope context and envelope in a single carrier type. Keeping the two separate required fewer code changes initially, however the context and the envelope always need to be passed and modified together.

I think this one will have the most impact

wip: Pipe envelope context through futures

90ae786

flub reviewed Aug 10, 2022

View reviewed changes

jan-auer added 4 commits August 10, 2022 20:27

ref: Move envelope context through the processor

49a4116

fix: Work around double-outcomes temporarily

cc3ad8c

fix: Test requiring system now

7715102

ref(server): Pipe context through CheckEnvelope

fc534e8

jan-auer commented Aug 10, 2022

View reviewed changes

jan-auer marked this pull request as ready for review August 10, 2022 21:15

jan-auer requested a review from a team August 10, 2022 21:15

jjbayer reviewed Aug 11, 2022

View reviewed changes

flub approved these changes Aug 11, 2022

View reviewed changes

jan-auer merged commit 6eeebba into master Aug 11, 2022

jan-auer deleted the ref/pipe-envelope-context branch August 11, 2022 10:07

This was referenced Aug 11, 2022

ref(server): Use an atomic semaphore to track buffer usage #1408

Merged

ref(server): Add a message to capture envelopes in capture mode #1409

Merged

ref(server): Transform EnvelopeManager into a sequential pipeline #1416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(server): Localize outcome emission in EnvelopeContext #1406

ref(server): Localize outcome emission in EnvelopeContext #1406

jan-auer commented Aug 10, 2022 •

edited

Loading

flub Aug 10, 2022

jan-auer Aug 10, 2022

jjbayer Aug 11, 2022

jan-auer left a comment

jan-auer Aug 10, 2022

jjbayer Aug 11, 2022

flub Aug 11, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jan-auer Aug 10, 2022

jjbayer Aug 11, 2022

jjbayer Aug 11, 2022

jjbayer Aug 11, 2022

jan-auer Aug 11, 2022 •

edited

Loading

jjbayer Aug 11, 2022

jjbayer Aug 11, 2022

jan-auer Aug 11, 2022

flub Aug 11, 2022

flub Aug 11, 2022

jan-auer Aug 11, 2022

flub Aug 11, 2022

jan-auer Aug 11, 2022

flub commented Aug 11, 2022

ref(server): Localize outcome emission in EnvelopeContext #1406

ref(server): Localize outcome emission in EnvelopeContext #1406

Conversation

jan-auer commented Aug 10, 2022 • edited Loading

EnvelopeContext

Control Flow

Follow-Ups

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer Aug 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flub commented Aug 11, 2022

jan-auer commented Aug 10, 2022 •

edited

Loading

jan-auer Aug 11, 2022 •

edited

Loading