Skip to content

magic7721/Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turorial

  • in settings.py, change WEBSITE value to 'NIPS', 'ICML', ICLR' or 'CVPR' (coresponding to databases currently in folder)

  • run function sqlite_query() in /spiders/__ init __ .py

  • it is basically an sqlite simulator, just query it like sqlite

Building new Database from website

  • in settings.py, change WEBSITE and YEAR values (default website is 'ICLR' (iclr.cc), year is [2021,2022])

  • run function build_database() in /spiders/__ init __ .py

  • WARNING, it will DELETE existing DATABASE if the names match

Explaining files

  • website_crawler.py: web crawler using Scrapy library, scrape site for data

  • search_api.py: web crawler using arxiv and wikipedia api, search for extra info

  • schema.py: build relation tables

  • GUI.py: interface

  • test.py: random stuffs

Using scrapy in terminal

  • for "scrapy crawl" command, open terminal in /spiders

  • for anything else, open terminal in root folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages