-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
39 lines (31 loc) · 1.45 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Tested on:
Linux Mint 12/13, python 2.7, BeautifulSoup 4
How to install:
sudo apt-get install python-qt4
sudo easy_install request
sudo apt-get install python-mysqldb
sudo apt-get install mysql-server
sudo easy_install requests
sudo easy_install beautifulsoup4
sudo easy_install pyvirtualdisplay
sudo apt-get install xvfb
depending on you configuration you may need some additional packages
Setup:
Create a ‘Site Samples Directory’ at the same level as the chains.py file.
Install mysql, modify the createDB.sql file as desired and source it in the mysql prompt.
At the mysql prompt, source the mywot.sql file.
Make sure to modify batch.py, chains.py and mywot/mywot.py according to how you modified
How to run:
“-batch” tells the script to run batch mode
“-skipcomments” tells the script to not collect comments from MYWOT
The chains.py takes files in the following format (example):
http://bit.ly/jU0kgr nonspam 1 1309265860 1309265860
./chains.py -batch datasets/URLs_spam.txt 0 10000 -skipcomments | tee -a spamRun.txt
Would follow chains starting at the first URL in URLs_spam.txt up until URL 10000 without scraping comments.
batch.py is essentially the same interface however it expects a file in one of the following formats:
http://bit.ly/jU0kgr
http://bit.ly/jU0kgr 1234
where 1234 is the popularity of a given URL.
If using popularity with batch.py you must provide the “-hasPop” flag.
Located at:
https://github.com/dmritard96/mywot-coverage