Skip to content

Recursively crawl GitHub/Bitbucket/Gitlab/Git repositories/companies in search for unsafely stored secrets

License

Notifications You must be signed in to change notification settings

mariolima/repocrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

9c02efb · Jan 12, 2021

History

84 Commits
Jan 12, 2021
Aug 31, 2019
Jan 10, 2021
Oct 16, 2019
Aug 8, 2019
Oct 16, 2019
Oct 16, 2019
Mar 16, 2020
Nov 21, 2019
Oct 16, 2019
Oct 16, 2019
Aug 6, 2019
Oct 16, 2019
Oct 16, 2019
Mar 16, 2020
Mar 16, 2020

Repository files navigation

RepoCrawler

Crawl GitHub/Bitbucket/Gitlab/Git repositories in search for unsafely stored secrets. Completely written in Go

Overview

Inspired by other tools like GitGot, Trufflehog, GitRob, Git-All-Secrets and many others, I decided to develop a tool that takes the best of these tools and impletements optimized crawling to most Git compatible services.

This tool crawls repositories on various Git services using a variety of methods that ultimately search for secrets/api tokens/sessions tokens/passwords/private keys that should otherwise be private.

Available Crawling methods

  • DeepCrawl (Inspired by Trufflehog)
    • Given a Git reporitory:
      • Enumerate all commits and analyse diff contents for submitted secrets
      • Enumerate all users that participated in the repo (commits) -> Look for public repositories belonging to these users -> Enumerate all commits and analyse diff contents for submitted secrets
                      +-------------------------------------+
                      | Users that participated in the repo |
                      +------------------+------------------+
                                         |
                       +-----------------+
                    +--v---+         +---v--+
                    |User 1|         |User 2|
            +-------------------+    +------+
            |                   |
      +-----v------+     +------v-----+
      |Repository 1|     |Repository 2|
      +-----+------+     +------+-----+
            |                   |
       +----v----+         +----v----+
       |DeepCrawl|         |DeepCrawl|
       +---------+         +---------+
      
  • Github
    • Given a string search parameter (Inspired by GitGot functionality):
      • Search the entirety of Github in search of code containing the submitted string /search/code API call
      • Loop through results and search for secrets in matched files
    • Given a Github repository / User / Organization do a deepcrawl
  • Bitbucket
    • Given a Bitbucket repository / User / Group do a deepcrawl
  • Gitlab
    • Given a Gitlab repository / User / Organization do a deepcrawl

Instalation

From source

go get github.com/mariolima/repocrawler
cd ~/go/src/github.com/mariolima/repocrawler/cmd/crawler-cli
go build .
export LOG_LEVEL=info
export GITHUB_ACCESS_TOKEN=TOKEN
./crawler-cli -h

Using Docker

git clone github.com/mariolima/repocrawler
cd repocrawler
docker build . -t repocrawler
docker run -it -e 'GITHUB_ACCESS_TOKEN=TOKEN' -e 'SLACK_WEBHOOK=YOURWEBHOOK' repocrawler -h

Structure

Packages used and why

Package
logrusorgru/aurora Color highlights in CLI matches - theres others but this one works the best
sirupsen/logrus Logs used for debugging - easy to setup and allows multiple verbose levels
src-d/go-git.v4 Clone and get data for Git servives - Heavy since most functions aren't used but no viable alternatives
bndr/gopencils Bitbucket/Gitlab API requests - Why not just use net ? Easier to code requests with multiple parameters/queries - No API bindings of Bitbucket/Gitlab for Go that support the calls needed
google/go-github API calls to Github (Get Repositories/Users/...)
pkg/profile Debuging and benchmarking with pprof
x/oauth2 Used for the go-github Github Token authentication request - def. overkill

FAQ

Credits