Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(event): Add correct limit and validation to distribution name [INGEST-1615] #1556

Merged
merged 7 commits into from
Nov 4, 2022

Conversation

olksdr
Copy link
Contributor

@olksdr olksdr commented Oct 28, 2022

Even though we had the max_chars limit set to 200 on the dist we never validated that length while trying to normalize the event.

This change adds the correct limit (64 chars) which is set on the Sentry site (field in the DB is varchar(64)) and also adds the validation with proper error generation if the value is longer than the length limit.

Also related: getsentry/sentry#40640

Even though we had the `max_chars` limit set to 200 on the `dist` we
never validated that length while trying to normalize the event.

This change adds the correct limit (64 chars) which is set on the Sentry
site (field in the DB is varchar(64)) and also adds the validation with
proper error generation if the value longer than the length limit.
@olksdr olksdr requested a review from jan-auer October 28, 2022 10:43
@olksdr olksdr self-assigned this Oct 28, 2022
@olksdr olksdr marked this pull request as ready for review October 28, 2022 10:43
@olksdr olksdr requested a review from a team October 28, 2022 10:43
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is actually a bugfix, please add the entry to the changelog.

Comment on lines 103 to 106
*distribution = Annotated::from_error(
Error::new(ErrorKind::ValueTooLong),
Some(Value::String(val.to_owned())),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new Annotated object, does it make sense to just add the error to the meta? Something like the following:

Suggested change
*distribution = Annotated::from_error(
Error::new(ErrorKind::ValueTooLong),
Some(Value::String(val.to_owned())),
)
distribution.meta_mut().add_error(...);

In this case, you'd still have the initial string there which is too long, so I'm not sure if this suggestion makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this using apply method

Some(Value::String(val.to_owned())),
)
} else if trimmed != val {
*val = trimmed.to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why do you assign the new value this way using references, instead of something like distribution.set_value(...)? Same question for *dist = None above.

@@ -90,20 +90,25 @@ pub fn is_valid_platform(platform: &str) -> bool {
VALID_PLATFORMS.contains(&platform)
}

pub fn normalize_dist(dist: &mut Option<String>) {
let mut erase = false;
pub fn normalize_dist(distribution: &mut Annotated<String>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change.

normalize_dist(&mut dist);
assert_eq!(dist.unwrap(), ""); // Not sure if this is what we want
assert_eq!(dist.value(), Some(&"".to_string())); // Not sure if this is what we want
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalization should be idempotent. If we make another normalization call, dist is then None. This situation must not happen -- we're running multiple layers of relays and debugging can be complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also take the chance and make a decision with regard to that comment. If the resulting string after trimming is empty, dist should be None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed and fixed this

@olksdr olksdr requested review from iker-barriocanal and a team November 2, 2022 10:54
Comment on lines +95 to +100
let trimmed = dist.trim();
if trimmed.is_empty() {
return Err(ProcessingAction::DeleteValueHard);
} else if bytecount::num_chars(trimmed.as_bytes()) > MaxChars::Distribution.limit() {
meta.add_error(Error::new(ErrorKind::ValueTooLong));
return Err(ProcessingAction::DeleteValueSoft);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same behaviour as in case of release field.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that in sentry we trim tags instead of deleting them, so I wonder which way we should go here: https://github.com/getsentry/sentry/blob/3a7c72d622b1fe64fa26ef568e20493be7bbee9e/src/sentry/event_manager.py#L178

@jjbayer
Copy link
Member

jjbayer commented Nov 2, 2022

It seems that in sentry we trim tags instead of deleting them, so I wonder which way we should go here: https://github.com/getsentry/sentry/blob/3a7c72d622b1fe64fa26ef568e20493be7bbee9e/src/sentry/event_manager.py#L178

Especially because getsentry/sentry#40640 truncates instead of deleting, so I feel like we shouldn't delete it in Relay?

@olksdr
Copy link
Contributor Author

olksdr commented Nov 2, 2022

But from the users' point of few - is truncated information better than no information on distribution at all ?

This is a good point. Deleting it while annotating the error gives the user explicit information about what went wrong, which is probably better than silently truncating. Another option would be to truncate and annotate, like the TrimmingProcessor does. But I'm fine with the change as-is.

@olksdr olksdr enabled auto-merge (squash) November 4, 2022 09:17
@olksdr olksdr disabled auto-merge November 4, 2022 09:18
@olksdr olksdr merged commit 9384de5 into master Nov 4, 2022
@olksdr olksdr deleted the fix/add-normalization-to-distribution branch November 4, 2022 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants