feat(store): Limit custom measurements [INGEST-1621] #1483

jjbayer · 2022-09-19T12:59:23Z

If measurementsConfig is provided, limit the number of custom measurements that an event can contain. The distinction between built-in and custom measurements is made through an explicit allowlist in the project config.

Once we have a global relay config propagation mechanism, the allowlist should be moved there, but we should not hard-code it in Relay, because (light) normalization also happens in customer relays, which means that if we added a new built-in measurement, an outdated external relay would count it towards custom measurements.

To be done in follow-up PRs:

Set this config in Sentry. See feat(relay): Limit custom measurements [INGEST-1620] sentry#39001.
Remove traces of old config (metrics allowlist, customMeasurements in transactionMetrics).

…easurements

iker-barriocanal

I've been thinking about the approach for a while, and I don't really like it, although I think it's fine for now (especially considering the scope of the work).

I don't think using measurement names is a strong approach in the state of the art. We still don't have any support for type/unit values, so users may be smart and try to add their custom measurements with other types and/or values, generating a name conflict. Another possibility is someone not knowing about the existence of a built-in metric and using its name for what they think is a custom metric in a term specific to their business.

In such cases of name conflict, relay will extract the built-in measurement with matching name, and completely ignore the measurement they want to extract. I rather an alternative approach in which we match the full MRI and not just the name. Again, I believe the scope of the approach I suggest is too big for now, but I'll still leave this written down for future reference.

relay-server/src/actors/project.rs

iker-barriocanal · 2022-09-20T10:00:09Z

relay-general/src/store/normalize.rs

+pub struct MeasurementsConfig {
+    /// A list of measurements that are built-in and are not subject to custom measurement limits.
+    #[serde(default, skip_serializing_if = "BTreeSet::<String>::is_empty")]
+    known_measurements: BTreeSet<String>,


Same suggestion from sentry's PR: I'd rename this to something similar to builtin_measurements in order to have one name per concept (and we use "built-in" quite extensively in general).

relay-general/src/store/normalize.rs

iker-barriocanal · 2022-09-20T10:16:21Z

relay-general/src/store/normalize.rs

@@ -221,12 +236,42 @@ fn normalize_breakdowns(event: &mut Event, breakdowns_config: Option<&Breakdowns
    }
 }

-/// Ensure measurements interface is only present for transaction events
-fn normalize_measurements(event: &mut Event) {
+/// Enforce the limit on custom (user defined) measurements.


nit / personal opinion: the function accepts a Measurements object that may contain built-in measurements (in fact, the function does get built-in measurements, and that's the first thing checked to decide if we should keep a measurement or not), so it's not really enforcing the limit on custom measurements but on all measurements. Personally, I'd generalize this line, and mention in another paragraph that all built-in measurements are accepted and we limit the custom ones by the project config.

Suggested change

/// Enforce the limit on custom (user defined) measurements.

/// Enforce the limit on measurements.

I renamed the function and updated the doc comment to clarify what it does.

jjbayer · 2022-09-20T13:54:08Z

We still don't have any support for type/unit values, so users may be smart and try to add their custom measurements with other types and/or values, generating a name conflict.

@iker-barriocanal My initial draft contained an allowlist of (name, unit) pairs. I agree that would be more solid, so I'll see if I can quickly implement that. Note that the type of a measurement (in terms of metrics types) is always distribution.

In such cases of name conflict, relay will extract the built-in measurement with matching name, and completely ignore the measurement they want to extract.

As discussed offline, I think this is already a problem in SDKs. Because the measurements entry of the event payload is a key/value mapping, only one measurement will survive for each name.

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>

iker-barriocanal · 2022-09-21T09:01:57Z

relay-general/src/store/normalize.rs

+                // If the name matches but the unit is wrong, we do not even accept it as a custom measurement,
+                // and just drop it instead.


Sounds good to me.

iker-barriocanal · 2022-09-21T09:14:30Z

relay-general/src/store/normalize.rs

    } else if let Some(measurements) = event.measurements.value_mut() {
+        if let Some(measurements_config) = measurements_config {
+            remove_invalid_measurements(measurements, measurements_config);


The measurements config is an Option and may be None, and no measurements would be removed in that case. Considering that we're removing measurements to protect the infrastructure, is it acceptable to not remove measurements if relay doesn't get that part of the config?

The same problem of the scope comes again here. No strong opinions, but I was wondering if we should drop all measurements if we don't get that part of the config. What do you think?

The config is None when the upstream sentry is not up-to-date. In that case, IMO Relay should behave as it used to, that is, accept all measurements.

If we have an incident and upstream sentry is unavailable for some time, relays will accept all measurements. I'm not concerned about the speed to ingest these once we're back or the consistency of some transactions having more measurements than others, but about the risk of blowing up storage if the incident takes a long time and other unexpected consequences we're currently not aware of. I'd rather to have a limit always in place and improve from that point.

That said, the PR is approved and this doesn't block landing it from my point of view, but I believe protecting the infrastructure in production is more important than everything else.

In tandem with getsentry/relay#1483, provide a new project config key that defines how many custom measurements an event may contain. To distinguish between custom and built-in measurements, the latter are explicitly enumerated as part of project config. This means considerable overhead for every project config, but I believe we need the flexibility here (see linked Relay PR).

The allowlist introduced in #1483 strictly checks the measurement unit. For that to work, we need to normalize the event unit first. Until now, unit normalization happened in metrics extraction. This PR moves it into (light) normalization.

With custom measurements limited by the `measurements` config in event normalization (see #1483), we do not need the special allowlist for transaction metrics anymore. config format to support old Relay instances. We will need a bump of `transactionMetrics.VERSION` to finally clean up that part.

jjbayer added 6 commits September 19, 2022 14:07

wip

62dce92

feat: Limit custom measurements

dc61d75

test: limit custom measurements

8800cce

fix

bef558d

Merge remote-tracking branch 'origin/master' into feat/limit-custom-m…

ceac3cd

…easurements

doc: Changelog

98539fe

jjbayer marked this pull request as ready for review September 19, 2022 13:39

jjbayer requested a review from a team September 19, 2022 13:39

jjbayer mentioned this pull request Sep 19, 2022

feat(relay): Limit custom measurements [INGEST-1620] getsentry/sentry#39001

Merged

jjbayer assigned jjbayer and iker-barriocanal and unassigned jjbayer Sep 20, 2022

iker-barriocanal approved these changes Sep 20, 2022

View reviewed changes

iker-barriocanal assigned jjbayer and unassigned iker-barriocanal Sep 20, 2022

jjbayer mentioned this pull request Sep 20, 2022

feat(metrics): Ignore transaction metrics allowlist [INGEST-1628] #1484

Merged

jjbayer and others added 5 commits September 20, 2022 15:58

Apply suggestions from code review

0024d9e

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>

known_measurements -> builtin_measurements

71e8054

ref: Stricter builtin measurement rules

2b291c5

doc: Update function name and doc comment

77da15d

fix: Fall back to unit None

8c4ac94

jjbayer requested a review from iker-barriocanal September 20, 2022 15:03

jjbayer assigned jjbayer and iker-barriocanal and unassigned jjbayer Sep 20, 2022

iker-barriocanal reviewed Sep 21, 2022

View reviewed changes

iker-barriocanal assigned jjbayer and unassigned iker-barriocanal Sep 21, 2022

jjbayer mentioned this pull request Sep 21, 2022

ref(store): Normalize units in event payload [INGEST-1638] #1488

Merged

Merge branch 'master' into feat/limit-custom-measurements

d67d074

jjbayer merged commit 9ccd339 into master Sep 22, 2022

jjbayer deleted the feat/limit-custom-measurements branch September 22, 2022 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(store): Limit custom measurements [INGEST-1621] #1483

feat(store): Limit custom measurements [INGEST-1621] #1483

jjbayer commented Sep 19, 2022 •

edited

Loading

iker-barriocanal left a comment

iker-barriocanal Sep 20, 2022

iker-barriocanal Sep 20, 2022

jjbayer Sep 20, 2022

jjbayer commented Sep 20, 2022

iker-barriocanal Sep 21, 2022

iker-barriocanal Sep 21, 2022

jjbayer Sep 21, 2022

iker-barriocanal Sep 21, 2022

	/// Enforce the limit on custom (user defined) measurements.
	/// Enforce the limit on measurements.

		// If the name matches but the unit is wrong, we do not even accept it as a custom measurement,
		// and just drop it instead.

feat(store): Limit custom measurements [INGEST-1621] #1483

feat(store): Limit custom measurements [INGEST-1621] #1483

Conversation

jjbayer commented Sep 19, 2022 • edited Loading

iker-barriocanal left a comment

Choose a reason for hiding this comment

iker-barriocanal Sep 20, 2022

Choose a reason for hiding this comment

iker-barriocanal Sep 20, 2022

Choose a reason for hiding this comment

jjbayer Sep 20, 2022

Choose a reason for hiding this comment

jjbayer commented Sep 20, 2022

iker-barriocanal Sep 21, 2022

Choose a reason for hiding this comment

iker-barriocanal Sep 21, 2022

Choose a reason for hiding this comment

jjbayer Sep 21, 2022

Choose a reason for hiding this comment

iker-barriocanal Sep 21, 2022

Choose a reason for hiding this comment

jjbayer commented Sep 19, 2022 •

edited

Loading