Skip to content

A Python based customizable script for scraping links to videos hosted on any website. Based on Scrapy and BeautifulSoup.

License

Notifications You must be signed in to change notification settings

preeteshjain/vidpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

vidpy

A Python based customizable script for getting/scraping links to videos hosted on any website. Implemented using Scrapy and BeautifulSoup.

Why?

I created this script for my friend who didn't watch all the videos in the Maths section from gre.magoosh.com and was worried about his ending subscription. There were way too many steps to go through and download all the videos. So I built this script with just 2 hours of effort to scrape all the links to the videos (where they were directly hosted, in this case, Cloudfront), so that he can download all the videos in one go.

Demo:

Vidpy Demo Video

Requirements:

  • A valid subscription is neccessary for downloading videos off the site.
  • Scrapy and BeautifulSoup should be installed for the script to work. Links:

Executing the script:

See Scrapy's documentation to learn how to execute spiders and crawlers.

Features:

  • You can customize this Python code to different categories in the site. In the current code, only the videos in Mathematics section will be scraped from gre.magoosh.com.
  • The algorithm and logic behind this script can be applied to any site to extract any form of data with precision.
  • Only the stuff you need will be extracted and rest all will be ignored. This saves time and overall bandwidth used to successfully run the script.

Note to contributors: Please update the documentation whenever neccessary.

About

A Python based customizable script for scraping links to videos hosted on any website. Based on Scrapy and BeautifulSoup.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages