-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(txname): Apply transaction rules to legacy SDKs #2250
Conversation
/// Apply transaction normalization rules to transactions from legacy SDKs. | ||
NormalizeLegacyTransactions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature has not been enabled by sentry yet, so merging this PR should not cause any functional changes.
@@ -176,25 +161,18 @@ pub struct TransactionNameRule { | |||
pub pattern: LazyGlob, | |||
/// Date time when the rule expires and it should not be applied anymore. | |||
pub expiry: DateTime<Utc>, | |||
/// Object containing transaction attributes the rules must only be applied to. | |||
#[serde(default)] | |||
pub scope: TransactionNameRuleScope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition for applying clustering rules has changed from source == "url"
to source in ["url", "sanitized", null]
. Instead of extending the config protocol to support lists, I suggest we drop this field from the config and let Relay decide when to apply these rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that old Relays will still deserialize the missing scope as {"source": "url"}
because of the #[serde(default)]
attribute, so this change is backward compatible.
&& self.expiry > now | ||
&& self.pattern.compiled().is_match(transaction) | ||
fn matches(&self, transaction: &str) -> bool { | ||
self.expiry > Utc::now() && self.pattern.compiled().is_match(transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not checking the source
also means that we're now applying sanitation rules to transactions that were already marked as sanitized. This should be fine, since normalizing a string like /foo/*/bar/
with rule /foo/*/**
will result in /foo/*/bar/
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a unit test for to make sure it behaves as we expect for sanitized transactions?
Maybe I'm thinking about this wrong, but when the new pattern comes in, which will the current url, it will get replaced and already sanitized value will be saved in meta info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add some tests.
@@ -532,14 +541,13 @@ impl Processor for TransactionsProcessor<'_> { | |||
.set_value(Some("<unlabeled transaction>".to_owned())) | |||
} | |||
|
|||
set_default_transaction_source(event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After setting the default transaction source, we know that source:unknown
means low-cardinality and source:null
means high-cardinality (see function itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, but overall lgtm
&& self.expiry > now | ||
&& self.pattern.compiled().is_match(transaction) | ||
fn matches(&self, transaction: &str) -> bool { | ||
self.expiry > Utc::now() && self.pattern.compiled().is_match(transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a unit test for to make sure it behaves as we expect for sanitized transactions?
Maybe I'm thinking about this wrong, but when the new pattern comes in, which will the current url, it will get replaced and already sanitized value will be saved in meta info.
// 2. It is marked with source:sanitized, in which case we run normalization again. | ||
// 3. It has no source attribute because it's from an old SDK version, | ||
// but it contains slashes and we expect it to be high-cardinality | ||
// based on the sdk information (see `set_default_transaction_source`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// based on the sdk information (see `set_default_transaction_source`). | |
// based on the SDK information (see `set_default_transaction_source`). |
Introduce feature flag to enable gradual rollout of getsentry/relay#2250.
Since #2250, we treat transactions without `transaction_info.source` from old SDKs as URLs if they look like URLs, that is: If it contains slashes, we apply all known clustering rules to the transaction name and then declare it as `sanitized`. There is still a small group of transactions that ends up without a transaction tag on performance metrics, even though it _probably_ does not produce high cardinality: Transactions without `transaction_info.source` from old SDKs that _do not_ contain slashes. Overview of transaction categories (by `transaction_info.source`) and whether we extract the transaction metrics tag for them: * Has source * ✅ low cardinality (`route`, `component`, `custom`, ...) * high cardinality * ✅ `url` -> now all marked as `sanitized` * ❌ _other_ -> still marked as unparameterized. An unknown transaction source is an SDK bug, so this is fine. * No source (legacy SDK) * ✅ `unknown` - assumed to be low cardinality by `is_high_cardinality_sdk` heuristic. * ✅ `None` * ✅ Looks like URL (contains `/`) -> marked as `sanitized`. * ❌ **Does not look like URL -> handled in this PR** This PR amends the `is_high_cardinality_sdk` heuristic to mark legacy transactions without slashes as `source:unknown`, which makes metrics extraction assume low cardinality and extract the original transaction name as tag. This PR also contains some cleanup work to simplify the control flow around `unknown` and `null` sources.
#2250 introduced a temporary feature flag because I wanted to do a gradual rollout. That flag can now be removed.
getsentry/sentry#51437 expanded the scope of transaction name clustering to transactions from old SDKs that do not send a
source
annotation yet. This PR enables the application of rules to this type of transactions in Relay.Because there is a potential of triggering the cardinality limiter, I gated this new behavior with a feature flag.
The condition for applying clusterer rules has now become more complicated, so I decided to remove / ignore the
scope
configuration flag entirely.ref: https://github.com/getsentry/team-ingest/issues/129