feat: add one more keyword #174

killuazhu · 2019-05-13T15:20:42Z

Add one more keyword token to the keyword list.

killuazhu · 2019-05-13T15:24:05Z

Looks like we haven't released a new version for awhile. Once we have recent PRs being merged, would we be able to generate a new release?

KevinHock · 2019-05-13T16:25:20Z

Sure thing, been meaning too :)

killuazhu · 2019-05-13T17:58:26Z

@KevinHock another thing, I sent you an email at your Github profile email kevin.hock.opentoall@gmail.com, once you get chance would you please take a look? It's coming from my Github profile email kyleezhu@gmail.com. Thanks.

KevinHock · 2019-05-13T18:12:11Z

Will do 👍

KevinHock · 2019-05-13T18:27:31Z

detect_secrets/plugins/keyword.py

@@ -45,6 +45,7 @@
    'private_key',
    'secret',
    'secrete',
+    'token',


I am kind of ambivalent about this, when we first wrote the keyword detector, token was a keyword, however the signal-to-noise ratio wasn't as good as the other keywords, it was too noisy.

Inflexibility is an anti-pattern though, which is one of the reasons we added a keyword exclude regex, so the static FALSE_POSITIVES dictionary could be added to and customized for things specific to each users' codebase.

We should probably do the same thing for keyword detector tokens, though it is tricky work, because we will have to write and read it from the baseline file. #146 and #151 are examples of issues stemming from work like this.

This would also not require other keyword additions like #148 to be blocked on us doing a ton of internal testing.

I'm wondering what kind of internal testing do you guys carry to accept / reject a new keyword? We haven't built up a large test code base yet, which we probably should. I'm curious how big is your test code base, and what's process for you to evaluate false positive ratio?

At least for the keyword detector, we made the regexes super loose so it would be noisy, found all the true positives stemming from it, and trimmed the regexes down best we could to catch as little false-positives as possible. A decent amount of manual work was involved with this.

It's still a little loud now, I plan on working on it a little more, but have been doing some other projects recently.

For testing, we ran it on all of our largest codebases, I'm not sure of the exact LoC, but probably a few hundred thousand or more.

p.s. Bumped ✊ the version today, btw 👍

killuazhu · 2019-08-13T19:36:23Z

I will close this PR since token is considered as a noisy keyword.

feat: add one more keyword

af01061

KevinHock reviewed May 13, 2019

View reviewed changes

KevinHock force-pushed the master branch from 81e2a44 to 6a3f206 Compare July 23, 2019 23:51

killuazhu closed this Aug 13, 2019

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020

bump to 0.12.5-ibm.6 (Yelp#174)

7618b81

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020

bump to 0.12.5-ibm.6 (Yelp#174)

12645b2

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Sep 17, 2020

bump to 0.12.5-ibm.6 (Yelp#174)

6c56301

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add one more keyword #174

feat: add one more keyword #174

killuazhu commented May 13, 2019

killuazhu commented May 13, 2019

KevinHock commented May 13, 2019

killuazhu commented May 13, 2019

KevinHock commented May 13, 2019

KevinHock May 13, 2019

killuazhu May 13, 2019

KevinHock May 13, 2019

killuazhu commented Aug 13, 2019

feat: add one more keyword #174

feat: add one more keyword #174

Conversation

killuazhu commented May 13, 2019

killuazhu commented May 13, 2019

KevinHock commented May 13, 2019

killuazhu commented May 13, 2019

KevinHock commented May 13, 2019

KevinHock May 13, 2019

Choose a reason for hiding this comment

killuazhu May 13, 2019

Choose a reason for hiding this comment

KevinHock May 13, 2019

Choose a reason for hiding this comment

killuazhu commented Aug 13, 2019