Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pii): Scrub arrays of 2 elements as key-value pairs #3639

Merged
merged 30 commits into from
Jun 10, 2024

Conversation

iambriccardo
Copy link
Member

@iambriccardo iambriccardo commented May 23, 2024

This PR adds support for treating an array of arrays of length two as an object. This was done since headers and other pieces of data are commonly encoded in [["key", "value"], ["key", "value"]] and our previous logic was scrubbing each string value individually.

The algorithm works by figuring out if given an array, it is composed of multiple arrays of length two where each first component is a string and then fakes entering the first element of the array as the key of an object.

Closes: #2567

let mut processor = PiiProcessor::new(config.compiled());
process_value(&mut event, &mut processor, ProcessingState::root()).unwrap();

let vars = event
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way?

@iambriccardo iambriccardo changed the title feat(pii): Scrub key value pairs feat(pii): Scrub arrays of 2 elements as key-value pairs May 27, 2024
process_value(&mut value[1], self, &entered)?;
// We check whether meta has changed from empty to non-empty in order to
// understand if some rules matched downstream.
if previous_meta.is_empty() && !value[1].meta().is_empty() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better impl here would check for structural equivalence but we might gain some performance if we assume that metadata won't be there on an incoming annotated value. Maybe this assumption is totally wrong.

r##"
{
"applications": {
"$string": ["@password:remove"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule @password:remove works by matching a key and then scrubbing the value, this is why it was put here.

@iambriccardo iambriccardo marked this pull request as ready for review May 27, 2024 12:24
@iambriccardo iambriccardo requested a review from a team as a code owner May 27, 2024 12:24
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to get rid of the .clone().into_value() calls as they are potentially expensive. We should aim for a structure like this:

        match try_as_pairlist(value) {
            Ok(map) => self.process_map(value, meta, state),
            Err(_) => value.process_child_values(self, state)
        }

where try_as_pairlist would invoke another visitor i.e. processor that attempts to assemble a borrowing map view, and process_map is a generalization of the existing process_object that can handle any kind of map-like structure.

let index_state = state.enter_index(index, state.inner_attrs(), value_type);
// We enter the key of the first element of the array, since we treat it
// as a pair.
let key_state = index_state.enter_borrowed(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to enter both by index and by key? That seems semantically incorrect, because we want the same semantics for [["a", 1], ["b": 2]] as for {"a": 1, "b": 2}. With the current code, we treat the list like [{"a": 1}, {"b": 2}].

See also

let entered = state.enter_borrowed(key_name, state.inner_attrs(), value_type);
processor::process_value(value, slf, &entered)?;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's actually a good point. I mistakenly implemented it before. Good catch

where
T: ProcessValue,
{
if is_pairlist(array) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a design where the check already returns you a prepared datastructure to visit:

if let Some(pairlist) = Pairlist::try_from(array) {
    pairlist.visit(self)
} else {
   visit_children
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would just pipe through the array since this check is not doing anything with the data structure itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this was the original design, but I didn't manage to create a lightweight (i.e. borrowing) pairlist from the visitor because the visitor (process_string) gets a string reference without a lifetime parameter. I'm still open to going that route though, either by changing the Processor framework to allow such a thing, or by simply accepting the cost and creating an owned PairList tentatively.

@@ -489,6 +597,30 @@ mod tests {
rv
}

fn extract_vars(event: Option<&Event>) -> &FrameVars {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I've seen a helper macro for this already somewhere 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_value!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing

relay-pii/src/processor.rs Outdated Show resolved Hide resolved
iambriccardo and others added 3 commits May 28, 2024 16:49
Co-authored-by: David Herberth <david.herberth@sentry.io>
@iambriccardo iambriccardo requested review from Dav1dde and jjbayer May 29, 2024 06:02
@@ -13,7 +13,7 @@ expression: data
"request": {
"headers": [
[
"[Filtered]",
"AuthToken",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first element is now kept since we treat is as a key.

@iambriccardo iambriccardo force-pushed the riccardo/feat/add-scrubbing branch from cd1cc8b to 123f6be Compare June 5, 2024 08:23
Comment on lines 1677 to 1680
[
"passwd",
"my_password"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this up to test both the key-detection and the value detection?

Suggested change
[
"passwd",
"my_password"
]
[
"passwd",
"asdf123"
],
[
"something_else",
"my_password"
]

{
self.is_pair = state.depth() == 0 && value.len() == 2;
if self.is_pair {
let value_type = ValueType::for_field(&value[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let value_type = ValueType::for_field(&value[0]);
let key_type = ValueType::for_field(&value[0]);

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -202,6 +244,69 @@ impl<'a> Processor for PiiProcessor<'a> {
}
}

#[derive(Default)]
struct PairListProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct PairListProcessor {
/// Checks whether an array is a pair with a string key.
struct PairCheck {

iambriccardo and others added 5 commits June 6, 2024 16:04
Co-authored-by: Joris Bayer <joris.bayer@sentry.io>
@iambriccardo iambriccardo merged commit 962c91f into master Jun 10, 2024
22 checks passed
@iambriccardo iambriccardo deleted the riccardo/feat/add-scrubbing branch June 10, 2024 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrub auth tokens anywhere
3 participants