Preparations for v1.0 Launch #355

domanchi · 2020-11-08T23:40:46Z

Preface

I've been getting several questions about when this branch will be launched. I too am excited for this change, however, these are the features that need to be worked on before we can deploy this:

Migrating KeywordDetector false positive heuristics (Migrating Keyword False Positives #377)
Ensuring consistent serialized baselines (Ensuring consistent serialized baselines #381)
Improved adhoc string scanning (improved adhoc string scanning #386)
Adding support for specifying your own filters
Adding support for disabling filters
Make it easier to "add all plugins" when using it as a package
Changing --disabled-plugins to have a more compatible style like --plugin
Integrate with bump2version

There are also some features that I think will be fantastic to add, but can wait till v1.1. These include:

Adding integration with gibberish-detector to reduce false positives with HighEntropyStrings
Adding --slim functionality to create baselines with the objective of minimal pre-commit modifications (removing line numbers, and generated_at key)
Refactor KeywordDetector so that keywords can be surfaced through filters, rather than built into the plugin itself

We're almost there!

Disclaimer

This PR is ridiculously long. Don't even bother. Seriously. The best way of checking this is to checkout my branch, play around with the tool, and read the source straight (rather than looking at this change log).

I really did try with keeping a nice git commit history. However, after some time, I realized that I was going to do fundamental architecture changes, and it will just be faster if each commit did not have to be backwards compatible by themselves. Due to this, the git history is more like mini-milestones towards this change for a better future.

My plan is that when we think the new version is ready for production, we'll just merge this PR. While you're reading the code, if you have any comments on it, create new issues and assign it to me -- I'll fix up the changes and submit a smaller, cleaner PR that bases off this new branch (essentially, a "second" master).

This PR therefore, is an attempt at providing more context, and detailing the changes I made at a high level, to make reading the changes a little more digestible.

Summary

This branch is a re-imagined architecture of the initial tool. After several years of using this tool in a variety of different workflows, as well as reading about several different pain points felt by the community, I redesigned this tool to be modular-by-design, and easily extensible.

On the successful merge of this branch, we will release version 1.0.0. Needless to say, this is a breaking change.

This new design introduces several new elements, including:

Global settings object, to eliminate all the pass-through parameters needed to be initialized for false positive identification
Modular filters, to provide a consistent (clean) approach of tuning out false positives
Dependency injection framework for better designed code
File scanners (e.g. config files, YAML files) now are file parsers (i.e. transformers) that convert special files into a compatible format for line-based scanning

I will make sure to reflect this in the CHANGELOG upon successful merge.

NOTE: There are still minor features to implement, but this should be in a pretty stable state to start playing with it.

Overview of Changes

New Features

Added NpmDetector and AzureStorageKeyDetector.

User-Facing Changes

I tried not to write additional features during this PR, since I wanted to swap out the underlying architecture, before continuing to add more things. However, these are the following changes I made to the interface of this tool:

Removed individual plugin disablement flags (e.g. --no-base64-string-scan). This list just became unnecessarily long, especially with the community adding many more plugins than initially expected (which is fantastic news!). Hopefully, the community continues to contribute, and it won't be at the expense of usability.
Added --disabled-plugins flag, so that you can list the names you want to disable. e.g.

Before: detect-secrets scan --no-base64-string-scan --no-slack-scan
After: detect-secrets scan --disabled-plugins Base64HighEntropyString,SlackDetector
Added --list-all-plugins flag, so that you can quickly list all plugins used in a scan.
Changed --use-all-plugins to --force-use-all-plugins, as I felt the latter more accurately described what was being done.
Changed detect-secrets scan --update <baseline> to detect-secrets scan --baseline <baseline> to keep it in sync with the pre-commit hook usage.
Changed detect-secrets audit --display-results to detect-secrets audit --stats. Also, --stats can be used with the audit --diff mode.
Changed --custom-plugins to --plugin
Stopped support for py35. I really wanted f-strings, and looks like it's already hit EOL (source)?

Baseline Changes

The details of the baseline changes can be found under detect_secrets.core.upgrades.*.

Removal of exclude, word_list and custom_plugin_paths (one-off keys)
Addition of filters_used to list all filters used to eliminate false positives
Renamed base64_limit and hex_limit to just limit

Developer Velocity

Documentation

OMG, so much better documentation! Check out docs/* for yourself.

Organization

/detect_secrets               # This is where the main code lives
    /audit                    # Powers `detect-secrets audit`
    /core                     # Powers the detect-secrets engine
        /upgrades             # For version bumps that modify the baseline, instructions
                              # on how to apply automated upgrades are found here

    /filters                  # Functions that allow for filtering of false positives
    /plugins                  # All plugins live here, modularized.
    /transformers             # Converters that transform special file formats into line proxies
    /util                     # utility functionality
    main.py                   # Entrypoint for console use
    pre_commit_hook.py        # Entrypoint for pre-commit hook
    settings.py               # Global settings object

Yay to not having crazy long files, and actually separating them out into easy-to-understand pieces (ironic for this PR, I know).

Decoupled `SecretsCollection` and baseline logic

Perhaps one of the most fundamental changes about this design is the introduction of the global Settings object, in detect_secrets.settings. This module is responsible for one thing: the serialization (and deserialization) of the engine's settings (aka. which plugins and filters are used). In pulling this out into it's own globally accessible module, we can avoid the ever-growing pass-through parameter list that had polluted this codebase (e.g. source)

After the configuration logic was separated, we could drastically simplify our plugin initialization code (found in detect_secrets.core.plugins.initialize). Be sure to check out the wild differences between the former version (invocation, definition) and the current version (invocation, definition). This allows us to move it out of setting them up in usage.py, and allows other developers to much more easily configure their plugins as part of scripts, as compared to being forced to rewrite parts of main.py.

This brings me to a huge paradigm shift: SecretsCollection is merely a container of secrets. Previously, the code was written in a way that the baseline was essentially a serialized SecretsCollection, which made it difficult to reason about. Now, the baseline is a combination of Settings and secrets (the interface of which is SecretsCollection) -- and this allows us to do things like detect_secrets.core.baseline.load.

Serialized, customizable heuristics to filter false positives

Scanning for secrets should not be a complicated task, however, the former code had evolved to making it so. Part of the reason is that as we've developed more ways of filtering out false positives, we've had to implement that logic in several different parts of the code (like, why the heck does a file parser need to know the regex used to exclude files?)

A better solution leverages dependency injection (DI) to supply parameters to filters on demand. For example, detect_secrets.filters.heuristic.is_sequential_string takes in the secret value, and returns True if the secret should be skipped. By aggregating all such configured filters (see detect_secrets.core.scan.get_filters), we can pass in any and all values that the filters may be looking for to make a decision (source).

Check out the currently implemented "out-of-the-box" filters in detect_secrets.filters.*

File Transformers

The thing about filters is that in its current design, they work best with line-based secrets. However, as you know, HighEntropyStrings is the special snowflake that handles YAML and config files differently. How should we resolve this discrepancy?

After playing around with several ideas, I decided that we could have a custom parser for the file, and convert the file into "proxy" lines -- that is, the lines are not an exact match to the original content, but they do allow us to run our line-based plugins on them. With this new change, all plugins are now line-based, and special files will be transformed into lines. Not only does that allow them to take advantage of line-based filters, but now all plugins don't need to re-implement their own parser for the special files!

This was a complicated change -- check out detect_secrets.transformers.* for more details.

Upgrade Infrastructure

With these changes, I realize that the baseline format cannot be depended upon to remain static. Rather, we needed a formalized method to upgrade baselines, that does a better job than merge_baseline. Thus, I created a framework for formalizing the changes made to baseline formats across versions, inspired by Django/Pyramid (I don't remember which).

detect_secrets.core.baseline.upgrade(...) will take your baseline version, and run it through all the necessary changes so that its format is compatible with the latest version. In doing so, it makes it easier for users to upgrade, as well as building in support for audits between versions.

Cleaner Tests

A good coding style heuristic is to structure your code so that it's easily testable. And from the looks of things, our core tests were really ugly and hard to test. I gutted a lot of tests, and opted for cleaner, straight-forward yet equally effective tests (which was easier to do, because of the better designed architecture).

Here's to hoping that these improvements make it much easier to work on this codebase in the future, and allow the community to continue working to make this tool useful.

Add a new detector which searches for NPM auth tokens

…udit.bidirectional_iterator

…snippet

… adding DI to filter logic

Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>

Square OAuth detector

…29651 CVE-2020-29651 fix

AWS Regex to flag aws-related variable names

…ilters Adding --exclude-secrets flag to explicitly ignore secret values

Adding more default filters

Integrating with bump2version

domanchi · 2021-02-25T00:16:56Z

All individual commits have been reviewed, and I have verified that this passes tests locally (since we currently don't have a working CI for this).

We're finally going live!

Minor change in detect-secrets: The Parameter "scan --update <baseline>" has changed to "scan --baseline <baseline>" in the v1.0 Reference: Yelp/detect-secrets#355 At header "User-Facing Changes"

ninoseki and others added 19 commits September 26, 2020 14:48

Add npm detector

b041151

Add a new detector which searches for NPM auth tokens

moving detect_secrets.core.color to detect_secrets.util.color

38abdb9

moving detect_secrets.core.bidirectional_iterator to detect_secrets.a…

6783cb8

…udit.bidirectional_iterator

moving detect_secrets.core.code_snippet to detect_secrets.audit.code_…

d96fcb3

…snippet

moving audit exceptions out to new file

2447e2f

adding type hints for PotentialSecret

750d60f

introducing global settings object

4843ab8

refactored detect_secrets.core.usage

3219650

introducing the concept of extensible filters

42d06ea

simplifying SecretsCollection by decoupling from baseline renditions;…

16a1915

… adding DI to filter logic

refactoring baseline

4eead7e

refactoring pre-commit hook

c8f7685

refactoring scan, and introducing transformers

cb04f37

refactoring plugins for ease of use

30ecb2d

standardizing base64-limit and hex-limit

3b678ab

adding test to assert secret types are unique

35af285

adding upgrade infrastructure

58975dc

refactoring detect_secrets.main scanning functionality

982f5ba

refactor audit functionality

9aaae64

domanchi requested a review from KevinHock November 8, 2020 23:40

tests pass

2d300cb

domanchi force-pushed the pre-v1-launch branch from be49a6f to 2d300cb Compare November 9, 2020 16:53

Aaron Loo and others added 8 commits November 9, 2020 09:51

adding verification

4b613a2

injecting context into verification filter

54248a5

re-enabling custom plugins

a39891f

Azure Storage Key Detector pligin

d850354

Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>

AzureStorageKeyDetector tests

4d7821b

Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>

Feedback fixes

a9c13f5

Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>

fix: update analyze_line() call

f92bb41

E501 fix

fa4eb9c

Co-authored-by: Dariusz Porowski <Dariusz.Porowski@microsoft.com>

Aaron Loo and others added 22 commits January 26, 2021 09:44

Merge branch 'pre-v1-slim-mode' into pre-v1-launch

2e39012

adding test case for successful use of module_path for custom filter

273d2e1

fixing tests

3f2b47b

Merge branch 'pre-v1-custom-filter-support' into pre-v1-launch

cf8fc45

tox -e mypy passes

7ee6306

bumping required coverage from 90% to 95%

6642685

integrating with bump2version

14b709d

CVE-2020-29651 fix

537a702

Renaming of Square OAuth plugin

de0ea4a

Merge pull request #398 from pablosantiagolopez/feature/v1-oauth-plugin

14365fc

Square OAuth detector

Merge pull request #402 from pablosantiagolopez/security/v1-cve-2020-…

c303b5c

…29651 CVE-2020-29651 fix

Merge branch 'pre-v1-launch' into feature/v1-multiple-filters

fb16538

Filters documentation

cbb42d5

Upgrade script compatibility

03ad10d

AWS group selection and test optimization

ec4008d

AWS plugin comments

1cdc00f

Merge pull request #397 from pablosantiagolopez/feature/v1-aws-plugin

8726299

AWS Regex to flag aws-related variable names

Merge pull request #391 from pablosantiagolopez/feature/v1-multiple-f…

a3165d1

…ilters Adding --exclude-secrets flag to explicitly ignore secret values

Merge github.com:Yelp/detect-secrets into pre-v1-launch

5210cd0

adding more default filters

f9010c1

Merge pull request #405 from Yelp/adding-more-default-filters

33fd15e

Adding more default filters

Merge pull request #395 from Yelp/pre-v1-bump2version

4e71e00

Integrating with bump2version

domanchi marked this pull request as ready for review February 25, 2021 00:16

domanchi merged commit f7b8b03 into master Feb 25, 2021

daparm mentioned this pull request Mar 16, 2021

Update detect-secrets.md microsoft/code-with-engineering-playbook#578

Merged

sandeep-ps mentioned this pull request Mar 17, 2021

[BUG] Pre-commit hook not working with the latest version of detect secrets. rokwire/events-manager#609

Closed

jimmyhlee94 pushed a commit to jimmyhlee94/detect-secrets that referenced this pull request Aug 19, 2021

Bump to version 0.13.1+ibm.19.dss to fix vulnerabilities (Yelp#355)

3338ba7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparations for v1.0 Launch #355

Preparations for v1.0 Launch #355

domanchi commented Nov 8, 2020 •

edited by jpdakran

Loading

domanchi commented Feb 25, 2021

Preparations for v1.0 Launch #355

Preparations for v1.0 Launch #355

Conversation

domanchi commented Nov 8, 2020 • edited by jpdakran Loading

Preface

Disclaimer

Summary

Overview of Changes

New Features

User-Facing Changes

Baseline Changes

Developer Velocity

Documentation

Organization

Decoupled SecretsCollection and baseline logic

Serialized, customizable heuristics to filter false positives

File Transformers

Upgrade Infrastructure

Cleaner Tests

domanchi commented Feb 25, 2021

domanchi commented Nov 8, 2020 •

edited by jpdakran

Loading

Decoupled `SecretsCollection` and baseline logic