Name		Name	Last commit message	Last commit date
parent directory ..
.redun		.redun
templates		templates
README.md		README.md
requirements.txt		requirements.txt
workflow.py		workflow.py

README.md

Let's go scraping!

Setup

This example requires a few additional libraries. You can install them using pip:

pip install -r requirements.txt

Running the example

redun run workflow.py main

By default this will scrape web pages from https://www.python.org/ with a depth of 2 link traversals. All of the HTML files encountered will be stored in crawl/. Word frequency across all pages will be calculated and a CSV of the word counts will be stored in computed/word_counts.txt.

Lastly, an HTML report is generated in reports/report.html that summarizes the scraping and analysis. The report is generated using a jinja2 template stored in templates/report.html.

Exercises for the reader

Feel free to try other urls and depth of scraping using the task arguments:

redun run workflow.py main --url URL --depth DEPTH

Also feel free to alter the report template templates/report.html. It is passed to the task make_report() as a File argument, so you should have automatic reactivity to changes in the template when rerunning the workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scraping

scraping

README.md

Let's go scraping!

Setup

Running the example

Exercises for the reader

Files

scraping

Directory actions

More options

Directory actions

More options

Latest commit

History

scraping

Folders and files

parent directory

README.md

Let's go scraping!

Setup

Running the example

Exercises for the reader