Skip to content

Improve paramaterization of transaction names #5342

Closed
@smeubank

Description

@smeubank

Describe the idea

We need to revisit how the JS SDK captures transactions (URL routes) and sends them to sentry.

Reasons for doing this and things we need to consider:

  • parameterization: We should parameterize URLs whenever possible

Because

  • grouping: transaction name influences transaction grouping; raw URL names lead to ungroupted transactions
  • indexing and high cardinality: Raw URL transaction names lead to a high cardinality of transactions (because they're not grouped)
  • PII: Raw URL transaction names can contain sensitive data/PII (e.g. IDs, auth tokens, etc.)
  • Dynamic Sampling: Propagating raw URLs in DSC vs. showing parameterized routes in the Sentry UI (and the DS settings) creates a lot of user confusion

Best effort has been decent so far, but the fallback to unparameterized or whole URL maybe sub-optimal.

Examples:

  • Low-cardinality transaction name:
    • {"transaction": "/users/{username}", "transaction_source": "route"}
  • Presumably a high-cardinality transaction name:
    • {"transaction": "/users/123235", "transaction_source": "uri"}
  • User-defined transaction name, cardinality unknown:
    • {"transaction": "my_transaction_name", "transaction_source": "custom"}

Requirement:

what should be sent and where?

baggage header

  • error envelope payload

  • in the envelope header

Possible implementation

There's a couple of things we can do or at least check:

Existing Routing Instrumentations with parameterization

As listed in #5345, we have a lot of popular routers covered with routing instrumentations. However, we might be able to improve paramenterizations in some of them. Hence, for each instrumentation

  • check if there is a way to parameterize earlier
  • try to match routes better (if it turns out that there are cases where our current matching fails)
  • (send source information; tracked in Add support for transaction source #5345)

Existing Routing Instrumentations without parameterization

TODO: Check to which routers this applies

There are some routing instrumentations that don't parameterize currently.

  • add parameterization whenever possible

TBD: Approximative Parameterization

This has been discussed quite a bit in the past but given that we have to make our best effort for parameterization, let's revisit this topic. The idea is seemingly simple: We could try to add a mechanism that takes a raw URL and tries to guess parts of that URL that might be parameters (e.g. IDs, tokens, etc). The mechanism would then replace these parts with a generic param placeholder.

Example:

/users/1235/credentials ==> users/:id/credentials

There are a lot of possible issues with this because obviously, there are going to be loads of edge cases, where this approximation might be off or miss parameters completely.

Why is this challenging?

  • In the product, we have to explain why DSC might have another (or no) transaction name than what's visible in the UI
  • We have to explain why unparameterized routes are sent and even full URLs.
  • Some frameworks and routers provide unique challenges
  • What about custom routing instrumentations we don't control?

Places to improve parameterization

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions