-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add one more keyword #174
Conversation
Looks like we haven't released a new version for awhile. Once we have recent PRs being merged, would we be able to generate a new release? |
Sure thing, been meaning too :) |
@KevinHock another thing, I sent you an email at your Github profile email kevin.hock.opentoall@gmail.com, once you get chance would you please take a look? It's coming from my Github profile email kyleezhu@gmail.com. Thanks. |
Will do 👍 |
@@ -45,6 +45,7 @@ | |||
'private_key', | |||
'secret', | |||
'secrete', | |||
'token', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am kind of ambivalent about this, when we first wrote the keyword detector, token
was a keyword, however the signal-to-noise ratio wasn't as good as the other keywords, it was too noisy.
Inflexibility is an anti-pattern though, which is one of the reasons we added a keyword exclude regex, so the static FALSE_POSITIVES
dictionary could be added to and customized for things specific to each users' codebase.
We should probably do the same thing for keyword detector tokens, though it is tricky work, because we will have to write and read it from the baseline file. #146 and #151 are examples of issues stemming from work like this.
This would also not require other keyword additions like #148 to be blocked on us doing a ton of internal testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what kind of internal testing do you guys carry to accept / reject a new keyword? We haven't built up a large test code base yet, which we probably should. I'm curious how big is your test code base, and what's process for you to evaluate false positive ratio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for the keyword detector, we made the regexes super loose so it would be noisy, found all the true positives stemming from it, and trimmed the regexes down best we could to catch as little false-positives as possible. A decent amount of manual work was involved with this.
It's still a little loud now, I plan on working on it a little more, but have been doing some other projects recently.
For testing, we ran it on all of our largest codebases, I'm not sure of the exact LoC, but probably a few hundred thousand or more.
I will close this PR since |
Add one more keyword
token
to the keyword list.