-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add one more keyword #174
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,7 @@ | |
'private_key', | ||
'secret', | ||
'secrete', | ||
'token', | ||
) | ||
FALSE_POSITIVES = { | ||
'""', | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am kind of ambivalent about this, when we first wrote the keyword detector,
token
was a keyword, however the signal-to-noise ratio wasn't as good as the other keywords, it was too noisy.Inflexibility is an anti-pattern though, which is one of the reasons we added a keyword exclude regex, so the static
FALSE_POSITIVES
dictionary could be added to and customized for things specific to each users' codebase.We should probably do the same thing for keyword detector tokens, though it is tricky work, because we will have to write and read it from the baseline file. #146 and #151 are examples of issues stemming from work like this.
This would also not require other keyword additions like #148 to be blocked on us doing a ton of internal testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what kind of internal testing do you guys carry to accept / reject a new keyword? We haven't built up a large test code base yet, which we probably should. I'm curious how big is your test code base, and what's process for you to evaluate false positive ratio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for the keyword detector, we made the regexes super loose so it would be noisy, found all the true positives stemming from it, and trimmed the regexes down best we could to catch as little false-positives as possible. A decent amount of manual work was involved with this.
It's still a little loud now, I plan on working on it a little more, but have been doing some other projects recently.
For testing, we ran it on all of our largest codebases, I'm not sure of the exact LoC, but probably a few hundred thousand or more.
p.s. Bumped ✊ the version today, btw 👍