-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support url-safe base64 secrets #245
Conversation
@@ -1,5 +1,5 @@ | |||
credentials: | |||
some_value_here: not_a_secret | |||
some_value_here: not_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this necessary b/c the entropy calculation with the new chars alerted on not_a_secret
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we need to be too concerned though because we now have the wordlist filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more concerned that, we'll have large diffs in baseline's when people update detect-secrets.
This isn't as concerning as changing a secret type like we did in #26, (where all old secrets were removed and re-added), but it is a little, especially if it reduces TP's to some extent. (We'll see what the data says though, I can't really say how it'll effect signal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why will we have large diffs? A lot of new secrets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the e.g. not_a_secret
potential secrets disappear from existing baselines, then there is a possibility we will have large diffs, in the case of FP's that's great, in the case of TP's that would be a regression visible to users. (We can't really say there are minimal regressions without data though.)
This commit updates the base64 plugin to support url-safe plugins by just adding - and _ to the charset.
We already check for whitelists in `ignored_lines = parser.get_ignored_lines()` call above, so calling `analyze_string` wastes time with the duplicated check.
We already check for whitelists in the IniFileParser, so doing another whitelist check here is redundant.
This renaming more accurately reflects what the function does in all the plugins (at the moment) and more clearly distinguishes it from `analyze_string_content`
0539c43
to
3d0dc36
Compare
|
||
# NOTE: this doesn't handle key-values on a line properly. | ||
# NOTE: words that end in "id" will be treated as ids | ||
_ID_DETECTOR_REGEX = re.compile(r'[iI][dD][^A-Za-z0-9]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to do _id
, we'll see what the data says though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on whether we want to ignore keys like BusinessId
. I think at Yelp this isn't likely but it's probably more likely in camelCase
language repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, we do have a lot of python biases.
# py2+py3 compatible way of copying a list | ||
functions = list(DEFAULT_FALSE_POSITIVE_HEURISTICS) | ||
functions.append(is_potential_uuid) | ||
|
||
if is_false_positive(result, self.automaton, functions=functions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on passing additional_heuristics
instead? I'm not sure when we would want to call is_false_positive
without the defaults (main motivation is prettifying though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about moving is_false_positive
to be a method in BasePlugin
and then make subclass re-implement it. This would allow us to override the filters used on a plugin-level (suggested in #250), but also set some reasonable defaults. In addition we can include the heuristics used in the configs for the plugins in baselines.
i.e. in code
class BasePlugin():
def __init__(self, false_positive_heuristics=None):
self.false_positive_heuristics = false_positive_heuristics if false_positive_heuristics else []
def is_false_positive(self, potential_secret):
return any(func(potential_secret) for func in self.false_positive_heuristics)
def get_config(self):
# include the fp heuristics used if applicable
class Plugin(BasePlugin):
def __init__(self, false_positive_heuristics=DEFAULT_HEURSTICS_FOR_PLUGIN): # I remember the default list in Python function constructor, I'll fix it in real code :)
super(Plugin, self).__init__(false_positive_heuristics)
def analyze_string_content(self, string):
for potential_secret in self.secret_generator(string):
if self.is_false_positive(string):
continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great to me 🎈
I'm only unsure of the In addition we can include the heuristics used in the configs for the plugins in baselines.
part, as I'm kind of okay with leaving that part blind to the user. (There are also the lesser possible objections someone could say that diffs in baselines should be minimal, and I'm not sure how we would say which heuristics each plugin used in a DRY way.)
0a34639
to
488334f
Compare
This updates the base64 plugin to support url-safe plugins by just adding
-
and_
to the charset. However, this won't be merged until I add some additional functionality to reduce noise.