docker-compose up
Environment variables
PYOFF_URL
- the site page that needs to be crawled and downloadedPYOFF_DEPTH
- the depth of the crawl. Default is0
, only the page's resourcesPYOFF_DESTINATION
- location where the site files are downloadedLOGLEVEL
- defaults to"INFO"
download(url)
downloads resources (HTML
, stylesheets, media, etc.) and
enqueues them for processing.
Subscribes to q_urls
, with url(url: str, depth: int = 0)
.
Produces to q_resources
, with
resource(url: str, depth: int, mimeType: str, contents: str)
.
parse(resource)
processes the contents of the resource, and decides what to
do next.
HTML
documents are scanned for links to same domain;URL
s are enqueued to be downloaded- Others are enqueued to be writted on the fs.
Subscribes to q_resources
.
Produces to q_files
, with file(name: str, content: str)
.
write(resource)
writes the resource to the filesystem.
Subscribes to q_files
.