WEC (Web crawling Email Collector)

A Web Crawling email collector

This python script goes through the web-page and looks for words provided in the list, and if those words are found stores the email addresses on the page into a database. It then repeats the same for all links on the page.

No, it does not run forever it only goes a few layers in.

Use cases

looking for professors who do research in a particular field
marketing
etc...

How to use

connect to the WIFI (You do not wanna do this on your mobile data).
clone or download the repository
install requirements (pip install -r requirements.txt).
go into the WEC folder cd WEC.
write down the list of all website too search through in a .txt file.
write down the list of all words too search for in a .txt file.
on the command line:
- python3 main.py <websites file path> <words file path>
- wait (can take a while)(recommended to leave over night)
when done it will have created a sqlite database which can be accessed using various tools e.g.: sqlite browser etc.

Requirements

List of institutes

list of institutes can be selected from nature index and many other places

Future prospects

write tests
make institute selection a part of the script
fine tune the algorithm as little
G.U.I.
M.L.

Note

Developers of the code are not responsible if you send an email you were not supposed to someone who was not supposed to get it. Therefore, it is recommended that you look at page form where the email has been picked before sending the email.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
WEC		WEC
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WEC (Web crawling Email Collector)

Use cases

How to use

Requirements

List of institutes

Future prospects

Note

About

Releases

Packages

Languages

License

Abhishek-Deshmukh/WEC

Folders and files

Latest commit

History

Repository files navigation

WEC (Web crawling Email Collector)

Use cases

How to use

Requirements

List of institutes

Future prospects

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages