Photon Library

Photon is available as a library for both Python 2 & Python 3.

To install photon as a library, you can simply do

pip install photon --user

Documentation

Most basic example

import photon
result = photon.crawl('http://example.com')

The crawl function returns a dict by default but you can use the format='json' argument for json output. It applies to both crawl and result functions. A sample json output can be found here.

To make the crawling as flexible as possible, following optional arguments are present

Argument	Type	Default
level	int	2
threads	int	2
timeout	float	6
delay	float	0
regex	str	None
exclude	str	None
seeds	list	None
user_agent	list	random
cookies	dict	None
keys	boolean	False
only_urls	boolean	False

Please go through the Photon wiki for a detailed explanation of each option.

The results are stored permanently after a crawling session. You can view them anytime as follows

import photon
photon.crawl('http://example.com')
print (photon.results())

Why is there a separate function for it?
Well it can be used in asynchronous programming. You can view the results even when the crawling is in progress.

If you are crawling different websites, you can easily clear the previous result by calling the clear() function as follows:

import photon
websites = ['https://google.com', 'https://github.com']
for website in websites:
    print (photon.crawl(website))
    photon.clear()

A more advanced example

import photon
result = photon.crawl('http://example.com', level=3, threads=10, keys=True, exclude='/blog/20[18|17]')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Photon Library

Documentation

Most basic example

A more advanced example

Clone this wiki locally