Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pii): Fix scrubbing user paths in minidump debug modules #4351

Merged
merged 6 commits into from
Dec 6, 2024

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Dec 6, 2024

#2724 silently broke user path scrubbing in minidumps. The regression was not automatically flagged because minidump scrubbing ignores regexes that cannot be compiled with unicode disabled, regardless of whether they are user-defined regexes or builtin static ones.

This PR both fixes the regex and adds an automatic test to each PII regex, so we can catch similar problems in the future.

@jjbayer jjbayer changed the title fix(pii): Ensure non-unicode support for static regexes fix(pii): Fix scrubbing user paths in minidump debug modules Dec 6, 2024
Comment on lines +116 to +125
#[test]
fn supports_byte_mode() {
assert!(regex::bytes::RegexBuilder::new($name.as_str())
.unicode(false)
.multi_line(false)
.dot_matches_new_line(true)
.build()
.is_ok());
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure the regex can also be used by minidump scrubbing. There are several open questions (see below), but this was the most conservative approach I could think of.

  1. Why does minidump scrubbing need .unicode(false)?
    let regex = match BytesRegexBuilder::new(regex.as_str())
    // https://github.com/rust-lang/regex/issues/697
    .unicode(false)
  2. Could we automatically generate a non-unicode version of every regex, such that it does not have to be recompiled on the fly for every minidump?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I read the issue and I still don't understand why this is necessary, would be nice if we had a test for this...
  2. That'd be nice or if we didn't even need the special casing in the first place.

});
)
(
[^/\\\r\n]+
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change that fixes the regex with unicode disabled, while still meeting the requirements of #2724:

relay/relay-pii/src/builtin.rs

Lines 1247 to 1248 in b5b738f

#[test]
fn test_userpath() {

@jjbayer jjbayer self-assigned this Dec 6, 2024
@jjbayer jjbayer marked this pull request as ready for review December 6, 2024 11:05
@jjbayer jjbayer requested a review from a team as a code owner December 6, 2024 11:05
relay-pii/src/regexes.rs Outdated Show resolved Hide resolved
Comment on lines +116 to +125
#[test]
fn supports_byte_mode() {
assert!(regex::bytes::RegexBuilder::new($name.as_str())
.unicode(false)
.multi_line(false)
.dot_matches_new_line(true)
.build()
.is_ok());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I read the issue and I still don't understand why this is necessary, would be nice if we had a test for this...
  2. That'd be nice or if we didn't even need the special casing in the first place.

@jjbayer jjbayer merged commit 8d83992 into master Dec 6, 2024
23 checks passed
@jjbayer jjbayer deleted the fix/pii-unicode-regex branch December 6, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants