-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pii): Scrub arrays of 2 elements as key-value pairs #3639
Conversation
relay-pii/src/processor.rs
Outdated
let mut processor = PiiProcessor::new(config.compiled()); | ||
process_value(&mut event, &mut processor, ProcessingState::root()).unwrap(); | ||
|
||
let vars = event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way?
relay-pii/src/processor.rs
Outdated
process_value(&mut value[1], self, &entered)?; | ||
// We check whether meta has changed from empty to non-empty in order to | ||
// understand if some rules matched downstream. | ||
if previous_meta.is_empty() && !value[1].meta().is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better impl here would check for structural equivalence but we might gain some performance if we assume that metadata won't be there on an incoming annotated value. Maybe this assumption is totally wrong.
r##" | ||
{ | ||
"applications": { | ||
"$string": ["@password:remove"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule @password:remove
works by matching a key and then scrubbing the value, this is why it was put here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to get rid of the .clone().into_value()
calls as they are potentially expensive. We should aim for a structure like this:
match try_as_pairlist(value) {
Ok(map) => self.process_map(value, meta, state),
Err(_) => value.process_child_values(self, state)
}
where try_as_pairlist
would invoke another visitor i.e. processor that attempts to assemble a borrowing map view, and process_map
is a generalization of the existing process_object
that can handle any kind of map-like structure.
relay-pii/src/processor.rs
Outdated
let index_state = state.enter_index(index, state.inner_attrs(), value_type); | ||
// We enter the key of the first element of the array, since we treat it | ||
// as a pair. | ||
let key_state = index_state.enter_borrowed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to enter both by index and by key? That seems semantically incorrect, because we want the same semantics for [["a", 1], ["b": 2]]
as for {"a": 1, "b": 2}
. With the current code, we treat the list like [{"a": 1}, {"b": 2}]
.
See also
Lines 28 to 29 in d812b63
let entered = state.enter_borrowed(key_name, state.inner_attrs(), value_type); | |
processor::process_value(value, slf, &entered)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's actually a good point. I mistakenly implemented it before. Good catch
where | ||
T: ProcessValue, | ||
{ | ||
if is_pairlist(array) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a design where the check already returns you a prepared datastructure to visit:
if let Some(pairlist) = Pairlist::try_from(array) {
pairlist.visit(self)
} else {
visit_children
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would just pipe through the array since this check is not doing anything with the data structure itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this was the original design, but I didn't manage to create a lightweight (i.e. borrowing) pairlist from the visitor because the visitor (process_string
) gets a string reference without a lifetime parameter. I'm still open to going that route though, either by changing the Processor
framework to allow such a thing, or by simply accepting the cost and creating an owned PairList
tentatively.
relay-pii/src/processor.rs
Outdated
@@ -489,6 +597,30 @@ mod tests { | |||
rv | |||
} | |||
|
|||
fn extract_vars(event: Option<&Event>) -> &FrameVars { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I've seen a helper macro for this already somewhere 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_value!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing
@@ -13,7 +13,7 @@ expression: data | |||
"request": { | |||
"headers": [ | |||
[ | |||
"[Filtered]", | |||
"AuthToken", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first element is now kept since we treat is as a key.
cd1cc8b
to
123f6be
Compare
relay-pii/src/convert.rs
Outdated
[ | ||
"passwd", | ||
"my_password" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this up to test both the key-detection and the value detection?
[ | |
"passwd", | |
"my_password" | |
] | |
[ | |
"passwd", | |
"asdf123" | |
], | |
[ | |
"something_else", | |
"my_password" | |
] |
relay-pii/src/processor.rs
Outdated
{ | ||
self.is_pair = state.depth() == 0 && value.len() == 2; | ||
if self.is_pair { | ||
let value_type = ValueType::for_field(&value[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let value_type = ValueType::for_field(&value[0]); | |
let key_type = ValueType::for_field(&value[0]); |
@@ -202,6 +244,69 @@ impl<'a> Processor for PiiProcessor<'a> { | |||
} | |||
} | |||
|
|||
#[derive(Default)] | |||
struct PairListProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct PairListProcessor { | |
/// Checks whether an array is a pair with a string key. | |
struct PairCheck { |
This PR adds support for treating an array of arrays of length two as an object. This was done since headers and other pieces of data are commonly encoded in
[["key", "value"], ["key", "value"]]
and our previous logic was scrubbing each string value individually.The algorithm works by figuring out if given an array, it is composed of multiple arrays of length two where each first component is a string and then fakes entering the first element of the array as the key of an object.
Closes: #2567