Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add one more keyword #174

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions detect_secrets/plugins/keyword.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
'private_key',
'secret',
'secrete',
'token',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of ambivalent about this, when we first wrote the keyword detector, token was a keyword, however the signal-to-noise ratio wasn't as good as the other keywords, it was too noisy.

Inflexibility is an anti-pattern though, which is one of the reasons we added a keyword exclude regex, so the static FALSE_POSITIVES dictionary could be added to and customized for things specific to each users' codebase.

We should probably do the same thing for keyword detector tokens, though it is tricky work, because we will have to write and read it from the baseline file. #146 and #151 are examples of issues stemming from work like this.

This would also not require other keyword additions like #148 to be blocked on us doing a ton of internal testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what kind of internal testing do you guys carry to accept / reject a new keyword? We haven't built up a large test code base yet, which we probably should. I'm curious how big is your test code base, and what's process for you to evaluate false positive ratio?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the keyword detector, we made the regexes super loose so it would be noisy, found all the true positives stemming from it, and trimmed the regexes down best we could to catch as little false-positives as possible. A decent amount of manual work was involved with this.

It's still a little loud now, I plan on working on it a little more, but have been doing some other projects recently.

For testing, we ran it on all of our largest codebases, I'm not sure of the exact LoC, but probably a few hundred thousand or more.

p.s. Bumped ✊ the version today, btw 👍

)
FALSE_POSITIVES = {
'""',
Expand Down
8 changes: 8 additions & 0 deletions tests/plugins/keyword_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
"quotes_required": [
'theapikey: ""', # Nothing in the quotes
'theapikey: "somefakekey"', # 'fake' in the secret
'token: "somefaketoken"', # 'fake' in the secret
],
"quotes_not_required": [
'theapikeyforfoo:hopenobodyfindsthisone', # Characters between apikey and :
Expand All @@ -30,11 +31,13 @@
'"theapikey": "{{h}o)p${e]nob(ody[finds>-_$#thisone}}"',
'apikey: "{{h}o)p${e]nob(ody[finds>-_$#thisone}}"',
"apikey: '{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
"token: '{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
],
"quotes_not_required": [
'apikey: {{h}o)p${e]nob(ody[finds>-_$#thisone}}',
'apikey:{{h}o)p${e]nob(ody[finds>-_$#thisone}}',
'theapikey:{{h}o)p${e]nob(ody[finds>-_$#thisone}}',
'token:{{h}o)p${e]nob(ody[finds>-_$#thisone}}',
],
},
}
Expand All @@ -60,6 +63,7 @@
'some_dict["secret"] = "{{h}o)p${e]nob(ody[finds>-_$#thisone}}"',
'the_password= "{{h}o)p${e]nob(ody[finds>-_$#thisone}}"\n',
'the_password=\'{{h}o)p${e]nob(ody[finds>-_$#thisone}}\'\n',
'apitoken=\'{{h}o)p${e]nob(ody[finds>-_$#thisone}}\'\n',
],
"quotes_not_required": [
"some_dict['secret'] = {{h}o)p${e]nob(ody[finds>-_$#thisone}}",
Expand All @@ -69,6 +73,7 @@
'my_password = {{h}o)p${e]nob(ody[finds>-_$#thisone}}',
'my_password ={{h}o)p${e]nob(ody[finds>-_$#thisone}}',
'the_password={{h}o)p${e]nob(ody[finds>-_$#thisone}}\n',
'my_token={{h}o)p${e]nob(ody[finds>-_$#thisone}}\n',
],
},
}
Expand Down Expand Up @@ -98,6 +103,7 @@
"quotes_required": [
'theapikey := ""', # Nothing in the quotes
'theapikey := "somefakekey"', # 'fake' in the secret
'token := "somefakekey"', # 'fake' in the secret
],
"quotes_not_required": [
'theapikeyforfoo := hopenobodyfindsthisone', # Characters between apikey and :=
Expand All @@ -115,12 +121,14 @@
"apikey:= '{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
"apikey:='{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
"apikey:= '{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
"token:= '{{h}o)p${e]nob(ody[finds>-_$#thisone}}'",
],
"quotes_not_required": [
"apikey := {{h}o)p${e]nob(ody[finds>-_$#thisone}}",
"apikey :={{h}o)p${e]nob(ody[finds>-_$#thisone}}",
"apikey:= {{h}o)p${e]nob(ody[finds>-_$#thisone}}",
"apikey:={{h}o)p${e]nob(ody[finds>-_$#thisone}}",
"thetoken:={{h}o)p${e]nob(ody[finds>-_$#thisone}}",
],
},
}
Expand Down