Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): Extract transaction from segment span #3375

Merged
merged 29 commits into from
Apr 11, 2024

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Apr 4, 2024

ref: #3278

The current state of Performance is that

  1. There are SDKs that emit transactions, and there are SDKs that emit standalone spans.
  2. There are parts of the product that require transactions, and there are parts of the product that require spans.

We already extract spans from transactions to make span-dependent product features work for "transaction"-SDKs. Now we need to also extract transactions from spans to make transaction-dependent product features work for "span"-SDKs:

flowchart LR
    
    SDK -->|transaction| Relay
    SDK -->|span| Relay
    Relay -->|span| spc[Span Consumer]
    Relay -->|transaction| txc[Transaction Consumer] 

    Relay -->|span from transaction| spc
    Relay -->|transaction from span| txc
    
    linkStyle 5 color:green;
Loading

Point of conversion

When to actually convert the segment span to transaction? I've listed two options below. At the moment, I believe only Option 1 is feasible because the span processing pipeline does not have all features implemented yet.

Option 1 - At the start of envelope processing [Selected (see Option 1a)]

flowchart TD
SDK -->|span| extract
extract-->|span| process_standalone_span
extract-->|transaction| process_transaction
Loading

Pros:

  • No need to duplicate code for e.g. transaction metrics extraction. An extracted transaction can pass through the same processing pipeline as an "organic" transaction sent from the SDK, with the same normalization, PII scrubbing, etc.
  • The transaction processing pipeline is more mature and better tested than the span processing pipeline.

Cons:

  • Some duplicate work: Normalization and PII scrubbing will run for both the original segment span and the transaction extracted from it.
  • Inconsistent with how we extract spans from transactions (this is done after processing).
  • Extraction will occur in edge Relays (not just processing Relays), so any updates to the conversion would take months to propagate to external Relays.

Option 1a - At the start of span processing in processing Relays [Selected]

Like Option 1, but only done in processing relays:

Pros:

  • Gets rid of one of the Cons of Option 1.
  • Spans are currently only parsed in processing Relays. No need to refactor that.

Cons:

  • Need to make sure that transactions are normalized, even if normalization is disabled in processing Relays.

Option 2 - At the end of envelope processing (in processing Relays) [Discarded]

flowchart TD

process_span["process span (normalize, filter, metrics, sample)"]

SDK -->|span| process_span
process_span -->|span| extract_transaction
extract_transaction -->|span| enforce_quotas
extract_transaction -->|transaction| extract_metrics_tx
extract_metrics_tx -->|transaction| enforce_quotas
Loading

Pros:

  • Assuming that span processing already filters, normalizes, samples and scrubs spans correctly, there would be no duplicate work done for the extracted transaction. All that's left would be transaction metrics extraction and rate limiting
  • Consistent with how we extract spans from transactions.

Cons:

  • Cannot leverage the fully mature transaction processing pipeline.
    • BLOCKING: Inbound filters and dynamic sampling for spans are not ready yet.
  • Needs some duplicate code to extract transaction metrics from extracted transactions.

Prevent duplicate data

We already cross the spans/transactions in two places:

  1. For every transaction, Relay extracts one standalone span for the transaction's child spans, and a standalone segment span for the transaction itself.
  2. For compatibility of performance scores, there is one transaction metric that is also extracted from standalone spans: "d:transactions/measurements.score.total@ratio".

To prevent circular conversion of data, I suggest to introduce two new item headers:

  1. "transaction_extracted" for span items, which will be checked before converting a span to a transaction. For segment spans extracted from transactions, this flag will be true from the start.
  2. "spans_extracted" for transaction items, which will be checked before extracting spans or span metrics from a transaction. For transactions extracted from spans, this flag will be true from the start.

In addition, we will stop extracting "d:transactions/measurements.score.total@ratio" from spans.

TODO

  • Ensure transaction gets normalized even if normalization is disabled
  • Test with a local dev setup to make sure transactions appear in the product without breaking consumers.
  • Modify test input to make is_segment false for some.
  • Add a test with score.total to ensure that it is never extracted more than once.

if has_fields {
let context_key = <$ContextType as DefaultContext>::default_key().into();
contexts.insert(context_key, ContextInner(context.into_context()).into());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents an empty ProfileContext from appearing in the event.

Comment on lines 45 to 48
if trace_context.exclusive_time.value().is_some() {
// Exclusive time already set, respect.
return;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a necessary change in behavior: A transaction derived from a standalone span does not have child spans, so we need to keep the exclusive_time set on the transaction, otherwise exclusive_time will always equal the full duration.

/// Returns a shared reference to the reservoir counters.
pub fn counters(&self) -> ReservoirCounters {
self.counters.clone()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to split off a ProcessEnvelope.


/// Whether or not spans have been extracted from a transaction.
#[serde(default, skip_serializing_if = "is_false")]
spans_extracted: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these two flags, we can prevent conversion between events and spans from going in circles.

config: &MetricExtractionConfig,
max_tag_value_size: usize,
) -> Vec<Bucket> {
let mut metrics = generic::extract_metrics(event, config);

// If spans were already extracted for an event,
// we rely on span processing to extract metrics.
if !spans_extracted {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Double-check if semantics make sense.

@jjbayer jjbayer force-pushed the feat/spans-to-transaction branch from e6c71c7 to 800ff44 Compare April 9, 2024 13:04
jjbayer added a commit that referenced this pull request Apr 9, 2024
Non-functional change to get rid of
`#[allow(clippy::too_many_arguments)]` on
`EnvelopeProcessorService::new`. This PR was originally part of
#3375, which adds yet another
`Addr`, but I decided to make a separate PR for reviewability.
@@ -383,6 +384,9 @@ pub fn serialize<G: EventProcessing>(
// If transaction metrics were extracted, set the corresponding item header
event_item.set_metrics_extracted(state.event_metrics_extracted);

// TODO: The state should simply maintain & update an `ItemHeaders` object.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up on this in a different PR.

@jjbayer
Copy link
Member Author

jjbayer commented Apr 9, 2024

Still requires some more test coverage, but opening for review to get feedback.

@jjbayer jjbayer marked this pull request as ready for review April 9, 2024 15:58
@jjbayer jjbayer requested a review from a team as a code owner April 9, 2024 15:58
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. Leaving a few comments and questions:

  • How are we planning to measure COGS for transactions coming from segments?
  • A single message with an envelope with N spans will result in N new messages in the processor, and we'd drop messages if the processor's queue is full. Solving that problem is out of scope of this PR, but what do you think about adding observability for that? This queue can grow quite fast without accepting new requests.

@@ -91,10 +98,18 @@ pub fn process(
return ItemAction::Drop(Outcome::Invalid(DiscardReason::Internal));
};

if should_extract_transactions && !item.transaction_extracted() {
if let Some(transaction) = convert_to_transaction(&annotated_span) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span and transaction normalization may have different requirements, so I'd extract transactions before normalizing spans (line 91 above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, refactored now. convert_to_transaction relies on is_segment normalization, so I moved that part out of normalization to still run before.

relay-server/src/services/processor/span/processing.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/span/processing.rs Outdated Show resolved Hide resolved
relay-sampling/src/evaluation.rs Outdated Show resolved Hide resolved
tests/integration/test_spans.py Outdated Show resolved Hide resolved
tests/integration/test_spans.py Outdated Show resolved Hide resolved
@@ -556,20 +580,46 @@ def test_span_ingestion(
"description": "my 3rd protobuf OTel span",
"duration_ms": 500,
"exclusive_time_ms": 500.0,
"is_segment": True,
"is_segment": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this span no longer a segment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately gave it a parent_span_id so I could verify that transactions are not extracted from regular spans, only from segment spans.

Copy link
Member Author

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iker-barriocanal

How are we planning to measure COGS for transactions coming from segments?

The extraction part will be accounted as AppFeature::Spans, the processing as AppFeature::Transactions. I think this is fine for now, as both are part of Performance cost.

A single message with an envelope with N spans will result in N new messages in the processor, and we'd drop messages if the processor's queue is full. Solving that problem is out of scope of this PR, but what do you think about adding observability for that? This queue can grow quite fast without accepting new requests.

Good point, I added two metrics now so we can observe the number of spin-off transactions per envelope.

relay-server/src/envelope.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/span/processing.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/span/processing.rs Outdated Show resolved Hide resolved
@@ -91,10 +98,18 @@ pub fn process(
return ItemAction::Drop(Outcome::Invalid(DiscardReason::Internal));
};

if should_extract_transactions && !item.transaction_extracted() {
if let Some(transaction) = convert_to_transaction(&annotated_span) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, refactored now. convert_to_transaction relies on is_segment normalization, so I moved that part out of normalization to still run before.

tests/integration/test_spans.py Outdated Show resolved Hide resolved
@@ -556,20 +580,46 @@ def test_span_ingestion(
"description": "my 3rd protobuf OTel span",
"duration_ms": 500,
"exclusive_time_ms": 500.0,
"is_segment": True,
"is_segment": False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately gave it a parent_span_id so I could verify that transactions are not extracted from regular spans, only from segment spans.

@@ -382,6 +382,9 @@ pub enum RelayTimers {
/// This metric is tagged with:
/// - `type`: The type of the health check, `liveness` or `readiness`.
HealthCheckDuration,

/// Measurees how many transactions were created from segment spans in a single envelope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Measurees how many transactions were created from segment spans in a single envelope.
/// Measures how many transactions were created from segment spans in a single envelope.

@@ -382,6 +382,9 @@ pub enum RelayTimers {
/// This metric is tagged with:
/// - `type`: The type of the health check, `liveness` or `readiness`.
HealthCheckDuration,

/// Measurees how many transactions were created from segment spans in a single envelope.
TransactionsFromSpansPerEnvelope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the variant to RelayHistograms?

relay-sampling/src/evaluation.rs Outdated Show resolved Hide resolved
relay-server/src/metrics_extraction/event.rs Show resolved Hide resolved
relay-server/src/envelope.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/span/processing.rs Outdated Show resolved Hide resolved
@jjbayer jjbayer merged commit f71e136 into master Apr 11, 2024
21 checks passed
@jjbayer jjbayer deleted the feat/spans-to-transaction branch April 11, 2024 13:58
@jjbayer
Copy link
Member Author

jjbayer commented Apr 11, 2024

Update: verified that a transaction extracted from spans ends up in the product without errors:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants