Skip to content

Script that downloads media files from a list of subreddits.

License

Notifications You must be signed in to change notification settings

crawsome/Reddit_Image_Scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reddit Image Scraper

Description

Reliably scrape multiple subreddits and users for multiple file formats.

Original

https://github.com/D3vd/Reddit_Image_Scraper

New Features

This version well-supersedes the template created previously, with MANY new features.

  • Auto-blacklisting low-quality images
  • Auto-blacklisting dead links
  • User-defined query timeout (how long will you wait between each query?)
  • User-defined API error timeout (this seems to help overall speed)
  • User-defined query quantity (How many queries per category per sub?)
  • User-defined minimum file size (to blacklist and delete after downloading)
  • De-duplication of downloaded files (It will never download the same file twice)
  • Puts files in respective folders
  • Logging of progress, all files downloaded
  • Logs the time it takes per sub, per category

And best of all, it's VERY EASY to setup.

Prerequisites / Packages Used

Make sure to have installed these libraries before executing the program.

First time running

Run it once

  1. Run the program once. It will create the source files you need to get started.

Get an API key by "Creating an app"

  1. Go to this link
  2. Press the Create an app button on the bottom.
  3. Give a name, and description for your app.
  4. Choose 'Script' in the app type section.

Back in the program

  1. Put the client ID and Secret in config.ini
  2. Add some subreddits to your subs.txt
  3. run python3 reddit_image_scraper.py.
  4. Check the ./result directory for your images!
  5. Check the ./logs folder for history / troubleshooting on your recent runs.

Warnings

Write some warnings here soon for best practices.

  • Don't run more than one at a time. Your API key will get rate-limited and both may go even slower.
  • DO NOT SHARE your API keys, or upload them anywhere public! Don't upload them to github, either! Treat them like a username/password.

Automating the script

Crontab entry for you if you like:

Runs once a day at 00:00 UTC.

00 00 * * * cd /path/to/script/Reddit_Image_Scraper-master && python3 Reddit_image_scraper.py

Gif Demo

About

Script that downloads media files from a list of subreddits.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages

  • Python 100.0%