Skip to content

rheinwerk-verlag/patu

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Patu

A small spider, useful for checking a site for 404s and 500s. Patu requires httplib2 and lxml:

pip install -U httplib2 lxml

Quick Usage

To see available options:

patu.py --help

To spider an entire site using 5 workers, only showing errors:

patu.py --spiders=5 www.example.com

To spider, stopping after the first level of links:

patu.py --depth=1 www.example.com

To get a list of every linked page on a site:

patu.py --generate www.example.com > urls.txt

Instead of spidering for URLs, use a file instead and show all responses:

patu.py --input=urls.txt --verbose www.example.com

Format of URLs File

The output produced by --generate is formatted like so:

FIRST_URL<TAB>None
LINK1<TAB>REFERER
LINK2<TAB>REFERER

--input can take a file of that format, or one URL per line with no referer. --input=- reads from stdin.

Testing

Patu uses Nose for testing. To install Nose and test:

pip install -U nose
nosetests

About

Patu is a small spider

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published