GitHub - santteegt/directv-scraper: Scraping the Latin America DirecTV programming guide by implementing a spider job using Scrapy.

Directv Programming Guide Scraper

Scraping the Latin America DirecTV programming guide by implementing a spider job using Scrapy.

Software Requirements

Python 2/3
pip
Scrapy
Docker

Setup Instructions

~$ pip install scrapy scrapyd scrapyd-client

Spider Configuration

TV_CHANNEL_RAGE: set the range of channels to scrape programming info Default value is (130, 600). You can modify this value in directv_spider.py file

Running Locally

~$ scrapy crawl directv -o directv.jl

Deployment

~ $ docker-compose up -d scrapyd
~ $ scrapyd-deploy -p directvscraper # deployed the `eggifyed` project

You can see default server conf using scrapyd-deploy -l, while deployed spider proyects scrapyd-deploy -L default
To schedule the spider, run the following:

~ $ curl http://localhost:6800/schedule.json -d project=directvscraper -d spider=directv

Alternatively you can use the [Directv Programing Guide - Data Cleaning.ipynb](Directv Programing Guide - Data Cleaning.ipynb) notebook to schedule and then clean the scrapped data.
To check the progress of the spider job, visit http://localhost:6800/jobs
To cancel the job, you can run the following:

~ $ curl http://localhost:6800/cancel.json -d project=directvscraper -d job=<JOB_ID>

Debugging

If you want to debug specific pages, you can run the following code

~ $ scrapy shell <URL>

More about Scrapyd

More info about the HTTP services available here

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
directvscraper		directvscraper
.gitignore		.gitignore
Directv Programming Guide - Data Cleaning.ipynb		Directv Programming Guide - Data Cleaning.ipynb
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Directv Programming Guide Scraper

Software Requirements

Setup Instructions

Spider Configuration

Running Locally

Deployment

Debugging

More about Scrapyd

About

Releases

Packages

Languages

License

santteegt/directv-scraper

Folders and files

Latest commit

History

Repository files navigation

Directv Programming Guide Scraper

Software Requirements

Setup Instructions

Spider Configuration

Running Locally

Deployment

Debugging

More about Scrapyd

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages