Skip to content

Python 3 script to scrape Hacker News feed and filter by points, number of comments, keywords exclusions

Notifications You must be signed in to change notification settings

victoriastuart/hacker_news_scraper

Repository files navigation

hacker_news_scraper

A Python 3 script for scraping the Hacker News feed, filtering that content by

  • number of points, and/or
  • number of comments, and/or
  • excluding posts {dead | flagged | youtube | wikipedia | ...} according to a keywords list

Run via ~/.bashrc alias or crontab (see notes near top of script).

Sample output: hn.txt

Updates

  • I provided a script, hn-regex_test.py for testing regex expressions over "hn.txt" output file:

  • added a dictionary and a method, multiple_replace(), to "hn.py" for postprocessing of various annoyances; e.g., the BeautifulSoup "smart quotes" that get added to the "hn.txt" output file

  • I scheduled the following in /etc/crontab which allows me to read (and save daily snapshots) of the output in my mail client (Claws Mail: URLs active) ...

# At 6:05 am [https://crontab.guru/#5_6_*_*_*]:
5    6    *    *    *    victoria    nice -n 19    mutt -e "set content_type=text/text" -s 'HackerNews' mail@VictoriasJourney.com -i /mnt/Vancouver/programming/python/scripts/output/hn.txt

mutt arguments:

s : subject

i : include file as message body

About

Python 3 script to scrape Hacker News feed and filter by points, number of comments, keywords exclusions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages