-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle binary values in YAML files #223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good so far.
I like your plan, maybe it would be easier to do regular binary strings first, and then do yaml binary strings, but it's up to you.
@@ -75,7 +75,7 @@ def _tag_dict_values(self, map_node): | |||
""" | |||
new_values = [] | |||
for key, value in map_node.value: | |||
if not value.tag.endswith(':str'): | |||
if not value.tag.endswith(':str') and not value.tag.endswith(':binary'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: feel free to ignore
i feel like
if not (
value.tag.endswith(':str')
or
value.tag.endswith(':binary')
):
is a little prettier ⚡️ 💟
@@ -92,6 +92,11 @@ def _tag_dict_values(self, map_node): | |||
str(value.__line__), | |||
'tag:yaml.org,2002:int', | |||
), | |||
self._create_key_value_pair_for_mapping_node_value( | |||
'__is_binary__', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the other calls of this don't do this, but maybe we can use keyword args to make things clearer, it helps with the other function calls in this file.
65e126c
to
4107194
Compare
This is step (1) in supporting binary in both YAML and non- YAML files. This makes it so that instead of immediately converting the base64-encoded binary into a binary value in python, we just interpret the binary as a normal string, but annotate it as such with the `is_binary` flag. This is needed so that plugins can scan a different value from the value hashed into baselines.
This implements support for high-entropy secrets in binary values in yaml files. We encode the binary value into a hex- or base64-encoded string (based on the plugin), and run the normal entropy check. If the string is deemed to be high-entropy, we re encode the string into a yaml binary (using `yaml.dump`) and strip the `!!binary`. This yaml binary is considered the secret, and is put into the baseline as normal. I had to update a test function so that it uses a custom hex high-entropy detector, since `HighEntropyStringsPlugin` is now an abstract class.
To test this you would need to import an unused class into the module so that it's in `globals()`, and have the test know what class that is. Seems messy to me, and not worth what it would be testing.
4107194
to
869033d
Compare
This is done. The general strategy is:
The complicated-ness of (3) is needed so that we can actually find the secret in the file afterwards using the existing |
re: ‘HighEntropyStringsPlugin is now an abstract class’ in commit message, for the explanation of the new class in the test, I’m on my phone but I don’t see that it’s abstract. |
Never mind me lol, I see the ABCMeta metaclass. I guess it was an abstract class before this PR and I read the message as this making it abstract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
⛴
Note to self to check when I’m on desktop: I’m not sure how we were instantiating an abstract class in a test before, I should check I understand that.
KevinHock++ I think the test was able to instantiate an abstract class because we didn’t actually mark any methods as abstract, but once I added some abstract methods, the test started failing so I had to fix it. |
(WIP as of writing)
This is step (1) in supporting binary in both YAML and non-YAML files.
This makes it so that instead of immediately converting the base64-encoded binary into a binary value in python, we just interpret the binary as a normal string, but annotate it as such with the
is_binary
flag.This is needed so that plugins can scan a different value from the value hashed into baselines.
Alternatives:
YamlFileParser
and put the desired form into__value__
__value__
and__is_binary__
, have fields__scan_value__
and__plaintext_value__
analyze_binary_content
(in addition toanalyze_string_content
) which knows how to do the conversion to the "desired format", but that depends on us knowing that the string is a binary as opposed to some other special sauce, so I think theis_binary
information is important.Closes #202
The next steps would be to make the high-entropy plugin actually use the
is_binary
information to do the conversion and output results correctly.