Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(txname): Apply transaction rules to legacy SDKs #2250

Merged
merged 10 commits into from
Jun 27, 2023

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jun 26, 2023

getsentry/sentry#51437 expanded the scope of transaction name clustering to transactions from old SDKs that do not send a source annotation yet. This PR enables the application of rules to this type of transactions in Relay.

Because there is a potential of triggering the cardinality limiter, I gated this new behavior with a feature flag.

The condition for applying clusterer rules has now become more complicated, so I decided to remove / ignore the scope configuration flag entirely.

ref: https://github.com/getsentry/team-ingest/issues/129

@jjbayer jjbayer changed the title Feat/unknown apply cluster rules feat(txname): Apply transaction rules to legacy SDKs Jun 26, 2023
Comment on lines +18 to +19
/// Apply transaction normalization rules to transactions from legacy SDKs.
NormalizeLegacyTransactions,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature has not been enabled by sentry yet, so merging this PR should not cause any functional changes.

@@ -176,25 +161,18 @@ pub struct TransactionNameRule {
pub pattern: LazyGlob,
/// Date time when the rule expires and it should not be applied anymore.
pub expiry: DateTime<Utc>,
/// Object containing transaction attributes the rules must only be applied to.
#[serde(default)]
pub scope: TransactionNameRuleScope,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition for applying clustering rules has changed from source == "url" to source in ["url", "sanitized", null]. Instead of extending the config protocol to support lists, I suggest we drop this field from the config and let Relay decide when to apply these rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that old Relays will still deserialize the missing scope as {"source": "url"} because of the #[serde(default)] attribute, so this change is backward compatible.

&& self.expiry > now
&& self.pattern.compiled().is_match(transaction)
fn matches(&self, transaction: &str) -> bool {
self.expiry > Utc::now() && self.pattern.compiled().is_match(transaction)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not checking the source also means that we're now applying sanitation rules to transactions that were already marked as sanitized. This should be fine, since normalizing a string like /foo/*/bar/ with rule /foo/*/** will result in /foo/*/bar/ again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test for to make sure it behaves as we expect for sanitized transactions?
Maybe I'm thinking about this wrong, but when the new pattern comes in, which will the current url, it will get replaced and already sanitized value will be saved in meta info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add some tests.

@@ -532,14 +541,13 @@ impl Processor for TransactionsProcessor<'_> {
.set_value(Some("<unlabeled transaction>".to_owned()))
}

set_default_transaction_source(event);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After setting the default transaction source, we know that source:unknown means low-cardinality and source:null means high-cardinality (see function itself).

@jjbayer jjbayer marked this pull request as ready for review June 26, 2023 12:42
@jjbayer jjbayer requested a review from a team June 26, 2023 12:42
Copy link
Contributor

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, but overall lgtm

relay-general/src/store/transactions/processor.rs Outdated Show resolved Hide resolved
&& self.expiry > now
&& self.pattern.compiled().is_match(transaction)
fn matches(&self, transaction: &str) -> bool {
self.expiry > Utc::now() && self.pattern.compiled().is_match(transaction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test for to make sure it behaves as we expect for sanitized transactions?
Maybe I'm thinking about this wrong, but when the new pattern comes in, which will the current url, it will get replaced and already sanitized value will be saved in meta info.

relay-general/src/store/transactions/processor.rs Outdated Show resolved Hide resolved
// 2. It is marked with source:sanitized, in which case we run normalization again.
// 3. It has no source attribute because it's from an old SDK version,
// but it contains slashes and we expect it to be high-cardinality
// based on the sdk information (see `set_default_transaction_source`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// based on the sdk information (see `set_default_transaction_source`).
// based on the SDK information (see `set_default_transaction_source`).

relay-general/src/store/transactions/processor.rs Outdated Show resolved Hide resolved
jjbayer added a commit to getsentry/sentry that referenced this pull request Jun 26, 2023
Introduce feature flag to enable gradual rollout of
getsentry/relay#2250.
@jjbayer jjbayer merged commit 063c134 into master Jun 27, 2023
@jjbayer jjbayer deleted the feat/unknown-apply-cluster-rules branch June 27, 2023 07:01
jjbayer added a commit that referenced this pull request Jun 27, 2023
This PR
* reverts #2040,
* ensures that feature flags are only propagated to downstream Relays if
they are known,
* removes unknown features on deserialization,
* fixes a bug in #2250, which did
not deserialize a new feature flag correctly.
jjbayer added a commit that referenced this pull request Jul 4, 2023
Since #2250, we treat
transactions without `transaction_info.source` from old SDKs as URLs if
they look like URLs, that is: If it contains slashes, we apply all known
clustering rules to the transaction name and then declare it as
`sanitized`.

There is still a small group of transactions that ends up without a
transaction tag on performance metrics, even though it _probably_ does
not produce high cardinality: Transactions without
`transaction_info.source` from old SDKs that _do not_ contain slashes.

Overview of transaction categories (by `transaction_info.source`) and
whether we extract the transaction metrics tag for them:

* Has source
  * ✅ low cardinality (`route`, `component`, `custom`, ...)
  * high cardinality
    * ✅ `url` -> now all marked as `sanitized`
* ❌ _other_ -> still marked as unparameterized. An unknown transaction
source is an SDK bug, so this is fine.
* No source (legacy SDK)
* ✅ `unknown` - assumed to be low cardinality by
`is_high_cardinality_sdk` heuristic.
  * ✅ `None`
    * ✅ Looks like URL (contains `/`) -> marked as `sanitized`.
    * ❌ **Does not look like URL -> handled in this PR**
    
This PR amends the `is_high_cardinality_sdk` heuristic to mark legacy
transactions without slashes as `source:unknown`, which makes metrics
extraction assume low cardinality and extract the original transaction
name as tag.

This PR also contains some cleanup work to simplify the control flow
around `unknown` and `null` sources.
jjbayer added a commit that referenced this pull request Jul 5, 2023
#2250 introduced a temporary
feature flag because I wanted to do a gradual rollout. That flag can now
be removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants