Skip to content

Scanners

Marco Rosa edited this page Aug 1, 2022 · 4 revisions

Scanners must be inserted in the credentialdigger/scanners folder. Every scanner must extend the BaseScanner class, i.e., it has to be initialized with the rules to be used, and it has to override the scan method.

The project comes with a git scanner and a file scanner, both based on Intel's Hyperscan.

Git Scanner

The git scanner clones a repository and analyzes it commit by commit. All the branches are considered. In particular, the GitScanner browses the history of the commits, and calculates the diff for each of them. In order to improve performance, it only considers newly added lines of modified files, i.e., it ignores lines of code deleted, and it ignores files if they have just been renamed or deleted. Moreover, it ignores binary files.

The scanner considers each commit only once. Indeed, commits can be replicated among the branches (e.g., in case of merge), and scanning them multiple times would be a waste of computational power and time.

Each diff is scanned with a set of regexes (defined by the user) using Hyperscan.

File Scanner

The file scanner analyzes the current content of a file or directory, regardless of it being related to a git repository or not.

It opens each file in the directory and all subdirectories, scanning each file for credentials. It only considers text files, ignoring binary files. An ignore list of files and directories can also be provided, and wildcards are supported as well. Moreover, a maximum level of depth relative to the root directory can be specified. The scanner will not navigate further into the subfolders tree.

GitFileScanner

TODO

GitPRScanner

TODO