Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork PhishingArmyCore with modifications to capture raw URLS by source. #1

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ericpollmann
Copy link
Collaborator

@ericpollmann ericpollmann commented Mar 30, 2024

Fork PhishingArmyCore with modifications to capture raw URLS by source. This is a start on getting a list of bad urls from multiple sources (in this case 5 sources used in PhishingArmy) attributed by source.

This runs in about 16.5 minutes most of which is spent in downloading certpl (13.5 minutes) and phishtank (3 minutes)

% time ./phishing.py
./phishing.py  974.72s user 1.10s system 98% cpu 16:28.02 total
% cat phishing.log
2024-03-29 16:50:57,351 -       main():253 - INFO - Loading 891340 domains in white_list
2024-03-29 16:50:57,351 -       main():256 - INFO - Getting phishtank list
2024-03-29 16:53:45,222 -       main():260 - INFO - Getting openphish list
2024-03-29 16:53:48,157 -       main():267 - INFO - Getting certpl list
2024-03-29 17:07:22,673 -       main():271 - INFO - Getting phishuntio list
2024-03-29 17:07:25,472 -       main():275 - INFO - Getting urlscanio list
2024-03-29 17:07:29,345 -       main():279 - INFO - Sorting lists
2024-03-29 17:07:29,451 -       main():283 - INFO - Generated the Blocklist containing 187741 domains
2024-03-29 17:07:29,451 -       main():284 - INFO - Generated the Extended Blocklist containing 189864 domains
2024-03-29 17:07:29,489 -       main():326 - INFO - Done
% ls -la out
total 33368
drwxr-xr-x@  9 eric  staff      288 Mar 29 17:07 .
drwxr-xr-x@ 14 eric  staff      448 Mar 29 16:31 ..
-rw-r--r--@  1 eric  staff  4033848 Mar 29 17:07 phishing_army_blocklist.txt
-rw-r--r--@  1 eric  staff  4067977 Mar 29 17:07 phishing_army_blocklist_extended.txt
-rw-r--r--@  1 eric  staff  4244952 Mar 29 17:07 raw_url_certpl.txt
-rw-r--r--@  1 eric  staff    22358 Mar 29 17:07 raw_url_openphish.txt
-rw-r--r--@  1 eric  staff  3647630 Mar 29 17:07 raw_url_phishtank.txt
-rw-r--r--@  1 eric  staff    43682 Mar 29 17:07 raw_url_phishuntio.txt
-rw-r--r--@  1 eric  staff     6116 Mar 29 17:07 raw_url_urlscanio.txt
% wc -l out/*
  187752 out/phishing_army_blocklist.txt
  189877 out/phishing_army_blocklist_extended.txt
  190818 out/raw_url_certpl.txt
     500 out/raw_url_openphish.txt
   53509 out/raw_url_phishtank.txt
     437 out/raw_url_phishuntio.txt
     200 out/raw_url_urlscanio.txt
  623093 total

@ericpollmann ericpollmann marked this pull request as draft August 21, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant