A Web Crawling email collector
This python script goes through the web-page and looks for words provided in the list, and if those words are found stores the email addresses on the page into a database. It then repeats the same for all links on the page.
No, it does not run forever it only goes a few layers in.
- looking for professors who do research in a particular field
- marketing
- etc...
- connect to the WIFI (You do not wanna do this on your mobile data).
- clone or download the repository
- install requirements (
pip install -r requirements.txt
). - go into the WEC folder
cd WEC
. - write down the list of all website too search through in a .txt file.
- write down the list of all words too search for in a .txt file.
- on the command line:
python3 main.py <websites file path> <words file path>
- wait (can take a while)(recommended to leave over night)
- when done it will have created a sqlite database which can be accessed using various tools e.g.: sqlite browser etc.
- python3
- bs4
- requests
- validate_email
- sqlite browser (optional)
list of institutes can be selected from nature index and many other places
- write tests
- make institute selection a part of the script
- fine tune the algorithm as little
- G.U.I.
- M.L.
Developers of the code are not responsible if you send an email you were not supposed to someone who was not supposed to get it. Therefore, it is recommended that you look at page form where the email has been picked before sending the email.