-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparations for v1.0 Launch #355
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a new detector which searches for NPM auth tokens
…udit.bidirectional_iterator
… adding DI to filter logic
domanchi
force-pushed
the
pre-v1-launch
branch
from
November 9, 2020 16:53
be49a6f
to
2d300cb
Compare
Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>
Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>
Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>
Square OAuth detector
AWS Regex to flag aws-related variable names
…ilters Adding --exclude-secrets flag to explicitly ignore secret values
Adding more default filters
Integrating with bump2version
All individual commits have been reviewed, and I have verified that this passes tests locally (since we currently don't have a working CI for this). We're finally going live! |
daparm
added a commit
to daparm/code-with-engineering-playbook
that referenced
this pull request
Mar 16, 2021
Minor change in detect-secrets: The Parameter "scan --update <baseline>" has changed to "scan --baseline <baseline>" in the v1.0 Reference: Yelp/detect-secrets#355 At header "User-Facing Changes"
TessFerrandez
pushed a commit
to microsoft/code-with-engineering-playbook
that referenced
this pull request
Mar 19, 2021
Minor change in detect-secrets: The Parameter "scan --update <baseline>" has changed to "scan --baseline <baseline>" in the v1.0 Reference: Yelp/detect-secrets#355 At header "User-Facing Changes"
jimmyhlee94
pushed a commit
to jimmyhlee94/detect-secrets
that referenced
this pull request
Aug 19, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Preface
I've been getting several questions about when this branch will be launched. I too am excited for this change, however, these are the features that need to be worked on before we can deploy this:
KeywordDetector
false positive heuristics (Migrating Keyword False Positives #377)--disabled-plugins
to have a more compatible style like--plugin
bump2version
There are also some features that I think will be fantastic to add, but can wait till v1.1. These include:
gibberish-detector
to reduce false positives withHighEntropyStrings
--slim
functionality to create baselines with the objective of minimal pre-commit modifications (removing line numbers, andgenerated_at
key)KeywordDetector
so that keywords can be surfaced through filters, rather than built into the plugin itselfWe're almost there!
Disclaimer
This PR is ridiculously long. Don't even bother. Seriously. The best way of checking this is to checkout my branch, play around with the tool, and read the source straight (rather than looking at this change log).
I really did try with keeping a nice git commit history. However, after some time, I realized that I was going to do fundamental architecture changes, and it will just be faster if each commit did not have to be backwards compatible by themselves. Due to this, the git history is more like mini-milestones towards this change for a better future.
My plan is that when we think the new version is ready for production, we'll just merge this PR. While you're reading the code, if you have any comments on it, create new issues and assign it to me -- I'll fix up the changes and submit a smaller, cleaner PR that bases off this new branch (essentially, a "second" master).
This PR therefore, is an attempt at providing more context, and detailing the changes I made at a high level, to make reading the changes a little more digestible.
Summary
This branch is a re-imagined architecture of the initial tool. After several years of using this tool in a variety of different workflows, as well as reading about several different pain points felt by the community, I redesigned this tool to be modular-by-design, and easily extensible.
On the successful merge of this branch, we will release version
1.0.0
. Needless to say, this is a breaking change.This new design introduces several new elements, including:
transformers
) that convert special files into a compatible format for line-based scanningI will make sure to reflect this in the
CHANGELOG
upon successful merge.NOTE: There are still minor features to implement, but this should be in a pretty stable state to start playing with it.
Overview of Changes
New Features
NpmDetector
andAzureStorageKeyDetector
.User-Facing Changes
I tried not to write additional features during this PR, since I wanted to swap out the underlying architecture, before continuing to add more things. However, these are the following changes I made to the interface of this tool:
Removed individual plugin disablement flags (e.g.
--no-base64-string-scan
). This list just became unnecessarily long, especially with the community adding many more plugins than initially expected (which is fantastic news!). Hopefully, the community continues to contribute, and it won't be at the expense of usability.Added
--disabled-plugins
flag, so that you can list the names you want to disable. e.g.Before:
detect-secrets scan --no-base64-string-scan --no-slack-scan
After:
detect-secrets scan --disabled-plugins Base64HighEntropyString,SlackDetector
Added
--list-all-plugins
flag, so that you can quickly list all plugins used in a scan.Changed
--use-all-plugins
to--force-use-all-plugins
, as I felt the latter more accurately described what was being done.Changed
detect-secrets scan --update <baseline>
todetect-secrets scan --baseline <baseline>
to keep it in sync with the pre-commit hook usage.Changed
detect-secrets audit --display-results
todetect-secrets audit --stats
. Also,--stats
can be used with theaudit --diff
mode.Changed
--custom-plugins
to--plugin
Stopped support for py35. I really wanted f-strings, and looks like it's already hit EOL (source)?
Baseline Changes
The details of the baseline changes can be found under
detect_secrets.core.upgrades.*
.exclude
,word_list
andcustom_plugin_paths
(one-off keys)filters_used
to list all filters used to eliminate false positivesbase64_limit
andhex_limit
to justlimit
Developer Velocity
Documentation
OMG, so much better documentation! Check out
docs/*
for yourself.Organization
Yay to not having crazy long files, and actually separating them out into easy-to-understand pieces (ironic for this PR, I know).
Decoupled
SecretsCollection
and baseline logicPerhaps one of the most fundamental changes about this design is the introduction of the global
Settings
object, indetect_secrets.settings
. This module is responsible for one thing: the serialization (and deserialization) of the engine's settings (aka. which plugins and filters are used). In pulling this out into it's own globally accessible module, we can avoid the ever-growing pass-through parameter list that had polluted this codebase (e.g. source)After the configuration logic was separated, we could drastically simplify our plugin initialization code (found in
detect_secrets.core.plugins.initialize
). Be sure to check out the wild differences between the former version (invocation, definition) and the current version (invocation, definition). This allows us to move it out of setting them up inusage.py
, and allows other developers to much more easily configure their plugins as part of scripts, as compared to being forced to rewrite parts ofmain.py
.This brings me to a huge paradigm shift:
SecretsCollection
is merely a container of secrets. Previously, the code was written in a way that the baseline was essentially a serializedSecretsCollection
, which made it difficult to reason about. Now, the baseline is a combination ofSettings
and secrets (the interface of which isSecretsCollection
) -- and this allows us to do things likedetect_secrets.core.baseline.load
.Serialized, customizable heuristics to filter false positives
Scanning for secrets should not be a complicated task, however, the former code had evolved to making it so. Part of the reason is that as we've developed more ways of filtering out false positives, we've had to implement that logic in several different parts of the code (like, why the heck does a file parser need to know the regex used to exclude files?)
A better solution leverages dependency injection (DI) to supply parameters to filters on demand. For example,
detect_secrets.filters.heuristic.is_sequential_string
takes in the secret value, and returnsTrue
if the secret should be skipped. By aggregating all such configured filters (seedetect_secrets.core.scan.get_filters
), we can pass in any and all values that the filters may be looking for to make a decision (source).Check out the currently implemented "out-of-the-box" filters in
detect_secrets.filters.*
File Transformers
The thing about filters is that in its current design, they work best with line-based secrets. However, as you know, HighEntropyStrings is the special snowflake that handles YAML and config files differently. How should we resolve this discrepancy?
After playing around with several ideas, I decided that we could have a custom parser for the file, and convert the file into "proxy" lines -- that is, the lines are not an exact match to the original content, but they do allow us to run our line-based plugins on them. With this new change, all plugins are now line-based, and special files will be transformed into lines. Not only does that allow them to take advantage of line-based filters, but now all plugins don't need to re-implement their own parser for the special files!
This was a complicated change -- check out
detect_secrets.transformers.*
for more details.Upgrade Infrastructure
With these changes, I realize that the baseline format cannot be depended upon to remain static. Rather, we needed a formalized method to upgrade baselines, that does a better job than
merge_baseline
. Thus, I created a framework for formalizing the changes made to baseline formats across versions, inspired by Django/Pyramid (I don't remember which).detect_secrets.core.baseline.upgrade(...)
will take your baseline version, and run it through all the necessary changes so that its format is compatible with the latest version. In doing so, it makes it easier for users to upgrade, as well as building in support for audits between versions.Cleaner Tests
A good coding style heuristic is to structure your code so that it's easily testable. And from the looks of things, our core tests were really ugly and hard to test. I gutted a lot of tests, and opted for cleaner, straight-forward yet equally effective tests (which was easier to do, because of the better designed architecture).
Here's to hoping that these improvements make it much easier to work on this codebase in the future, and allow the community to continue working to make this tool useful.