Skip to content

Photon Library

Somdev Sangwan edited this page Aug 24, 2018 · 3 revisions

Photon is available as a library for both Python 2 & Python 3.

To install photon as a library, you can simply do

pip install photon --user

Documentation

Most basic example

import photon
result = photon.crawl('http://example.com')

The crawl function returns a dict by default but you can use the format='json' argument for json output. It applies to both crawl and result functions. A sample json output can be found here.

To make the crawling as flexible as possible, following optional arguments are present

Argument Type Default
level int 2
threads int 2
timeout float 6
delay float 0
regex str None
exclude str None
seeds list None
user_agent list random
cookies dict None
keys boolean False
only_urls boolean False

Please go through the Photon wiki for a detailed explanation of each option.

The results are stored permanently after a crawling session. You can view them anytime as follows

import photon
photon.crawl('http://example.com')
print (photon.results())

Why is there a separate function for it?
Well it can be used in asynchronous programming. You can view the results even when the crawling is in progress.

If you are crawling different websites, you can easily clear the previous result by calling the clear() function as follows:

import photon
websites = ['https://google.com', 'https://github.com']
for website in websites:
    print (photon.crawl(website))
    photon.clear()

A more advanced example

import photon
result = photon.crawl('http://example.com', level=3, threads=10, keys=True, exclude='/blog/20[18|17]')
Clone this wiki locally