Skip to content

a repository to hold the contents for our group CS 242 project

Notifications You must be signed in to change notification settings

cmoussa1/cs242-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy Notes

installation requirements/prerequisites

First, check to see that you have Python (>= 3.6) and Scrapy installed on your system. To install Scrapy:

pip3 install scrapy

instructions on how to set up a Scrapy project

Inside of the directory of your choice, run:

scrapy startproject my_project_name

There will be a number of folders and files created!!

  • scrapy.cfg: the project configuration file.
  • projectname/: this directory contains your project’s Python modules.
  • projectname/items.py: define the data structure for scraped data here.
  • projectname/pipelines.py: process the scraped data (e.g., cleaning, storing to a database).
  • projectname/settings.py: configure settings like user agent, concurrent requests, etc.
  • projectname/spiders/: this directory will contain your spiders.

Inside the spiders/ director, you can create a spider file using the scrapy genspider command:

scrapy genspider my_new_spider www.baseball-reference.com

About

a repository to hold the contents for our group CS 242 project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published