Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add one more keyword #174

Closed
wants to merge 1 commit into from

Conversation

killuazhu
Copy link
Contributor

Add one more keyword token to the keyword list.

@killuazhu
Copy link
Contributor Author

Looks like we haven't released a new version for awhile. Once we have recent PRs being merged, would we be able to generate a new release?

@KevinHock
Copy link
Collaborator

Sure thing, been meaning too :)

@killuazhu
Copy link
Contributor Author

@KevinHock another thing, I sent you an email at your Github profile email kevin.hock.opentoall@gmail.com, once you get chance would you please take a look? It's coming from my Github profile email kyleezhu@gmail.com. Thanks.

@KevinHock
Copy link
Collaborator

Will do 👍

@@ -45,6 +45,7 @@
'private_key',
'secret',
'secrete',
'token',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of ambivalent about this, when we first wrote the keyword detector, token was a keyword, however the signal-to-noise ratio wasn't as good as the other keywords, it was too noisy.

Inflexibility is an anti-pattern though, which is one of the reasons we added a keyword exclude regex, so the static FALSE_POSITIVES dictionary could be added to and customized for things specific to each users' codebase.

We should probably do the same thing for keyword detector tokens, though it is tricky work, because we will have to write and read it from the baseline file. #146 and #151 are examples of issues stemming from work like this.

This would also not require other keyword additions like #148 to be blocked on us doing a ton of internal testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what kind of internal testing do you guys carry to accept / reject a new keyword? We haven't built up a large test code base yet, which we probably should. I'm curious how big is your test code base, and what's process for you to evaluate false positive ratio?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the keyword detector, we made the regexes super loose so it would be noisy, found all the true positives stemming from it, and trimmed the regexes down best we could to catch as little false-positives as possible. A decent amount of manual work was involved with this.

It's still a little loud now, I plan on working on it a little more, but have been doing some other projects recently.

For testing, we ran it on all of our largest codebases, I'm not sure of the exact LoC, but probably a few hundred thousand or more.

p.s. Bumped ✊ the version today, btw 👍

@killuazhu
Copy link
Contributor Author

I will close this PR since token is considered as a noisy keyword.

@killuazhu killuazhu closed this Aug 13, 2019
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants