-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(store): Limit custom measurements [INGEST-1621] #1483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about the approach for a while, and I don't really like it, although I think it's fine for now (especially considering the scope of the work).
I don't think using measurement names is a strong approach in the state of the art. We still don't have any support for type/unit values, so users may be smart and try to add their custom measurements with other types and/or values, generating a name conflict. Another possibility is someone not knowing about the existence of a built-in metric and using its name for what they think is a custom metric in a term specific to their business.
In such cases of name conflict, relay will extract the built-in measurement with matching name, and completely ignore the measurement they want to extract. I rather an alternative approach in which we match the full MRI and not just the name. Again, I believe the scope of the approach I suggest is too big for now, but I'll still leave this written down for future reference.
relay-general/src/store/normalize.rs
Outdated
pub struct MeasurementsConfig { | ||
/// A list of measurements that are built-in and are not subject to custom measurement limits. | ||
#[serde(default, skip_serializing_if = "BTreeSet::<String>::is_empty")] | ||
known_measurements: BTreeSet<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion from sentry's PR: I'd rename this to something similar to builtin_measurements
in order to have one name per concept (and we use "built-in" quite extensively in general).
relay-general/src/store/normalize.rs
Outdated
@@ -221,12 +236,42 @@ fn normalize_breakdowns(event: &mut Event, breakdowns_config: Option<&Breakdowns | |||
} | |||
} | |||
|
|||
/// Ensure measurements interface is only present for transaction events | |||
fn normalize_measurements(event: &mut Event) { | |||
/// Enforce the limit on custom (user defined) measurements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit / personal opinion: the function accepts a Measurements
object that may contain built-in measurements (in fact, the function does get built-in measurements, and that's the first thing checked to decide if we should keep a measurement or not), so it's not really enforcing the limit on custom measurements but on all measurements. Personally, I'd generalize this line, and mention in another paragraph that all built-in measurements are accepted and we limit the custom ones by the project config.
/// Enforce the limit on custom (user defined) measurements. | |
/// Enforce the limit on measurements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the function and updated the doc comment to clarify what it does.
@iker-barriocanal My initial draft contained an allowlist of
As discussed offline, I think this is already a problem in SDKs. Because the |
Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
// If the name matches but the unit is wrong, we do not even accept it as a custom measurement, | ||
// and just drop it instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
relay-general/src/store/normalize.rs
Outdated
} else if let Some(measurements) = event.measurements.value_mut() { | ||
if let Some(measurements_config) = measurements_config { | ||
remove_invalid_measurements(measurements, measurements_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The measurements config is an Option
and may be None
, and no measurements would be removed in that case. Considering that we're removing measurements to protect the infrastructure, is it acceptable to not remove measurements if relay doesn't get that part of the config?
The same problem of the scope comes again here. No strong opinions, but I was wondering if we should drop all measurements if we don't get that part of the config. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is None
when the upstream sentry is not up-to-date. In that case, IMO Relay should behave as it used to, that is, accept all measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an incident and upstream sentry is unavailable for some time, relays will accept all measurements. I'm not concerned about the speed to ingest these once we're back or the consistency of some transactions having more measurements than others, but about the risk of blowing up storage if the incident takes a long time and other unexpected consequences we're currently not aware of. I'd rather to have a limit always in place and improve from that point.
That said, the PR is approved and this doesn't block landing it from my point of view, but I believe protecting the infrastructure in production is more important than everything else.
In tandem with getsentry/relay#1483, provide a new project config key that defines how many custom measurements an event may contain. To distinguish between custom and built-in measurements, the latter are explicitly enumerated as part of project config. This means considerable overhead for every project config, but I believe we need the flexibility here (see linked Relay PR).
The allowlist introduced in #1483 strictly checks the measurement unit. For that to work, we need to normalize the event unit first. Until now, unit normalization happened in metrics extraction. This PR moves it into (light) normalization.
With custom measurements limited by the `measurements` config in event normalization (see #1483), we do not need the special allowlist for transaction metrics anymore. config format to support old Relay instances. We will need a bump of `transactionMetrics.VERSION` to finally clean up that part.
If
measurementsConfig
is provided, limit the number of custom measurements that an event can contain. The distinction between built-in and custom measurements is made through an explicit allowlist in the project config.Once we have a global relay config propagation mechanism, the allowlist should be moved there, but we should not hard-code it in Relay, because (light) normalization also happens in customer relays, which means that if we added a new built-in measurement, an outdated external relay would count it towards custom measurements.
To be done in follow-up PRs:
customMeasurements
intransactionMetrics
).