The scraper uses pipelines in order to store the scraped data in to an SQLite DB. The website being scraped is this one. There are roughly 1500 items that get scraped, each of them containing the chords for a popular pop song. The data then gets stored in Firebase, and would be used in a newly recreated version of the website later on, which would use React.js, Material UI, and service workers, in order to be a PWA and offer fully functional offline user experience.
To restore the environment do the following:
- Download and install Anaconda
- Open Anaconda terminal (or Anaconda Powershell)
- Navigate to the root project directory
- Run
conda env create -f environment.yml
- this would restore the enviroment and dependencies - Run
conda activate scrapyEnv
in order to activate the environment - Run
scrapy crawl akordite_crawler
- this would actually run the crawler and generate an .csv file containing the data scraped from akordite.com
Use the provided launch.json debug configuration.