youenn98 / syllabus-scraper Public

forked from wasedatime/syllabus-scraper

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A new web scraper for scraping the Waseda University syllabus.

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
scraper		scraper
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
syllabus_scraper.py		syllabus_scraper.py

Repository files navigation

syllabus-scraper

A new web scraper for scraping Waseda University syllabus.

Usage

my_crawler = SyllabusCrawler(configs...)
results = my_crawler.execute()
print(list(results))

Configuration

dept

The name of the school you want to scrape
task

To be defined...
engine

The syllabus-scraper engine you want to use:

thread-only default engine, use traditional worker threads to scrape each course

hybrid use threads with coroutines, that is, the task of scraping courses in a single page is assigned to a thread , for each course in the page, a coroutine is created to scrape the course. Use with caution!
worker

Number of worker threads, the default value is 8

Benchmarks

engine	number of courses	number of workers	execution time (s)
`thread-only`	454	1	178
`thread-only`	454	4	60
`thread-only`	454	8	32
`thread-only`	454	32	14
`thread-only`	100	32	5
`hybrid`	100	1	4
`hybrid`	200	2	6
`hybrid`	454	5	???(Connection refused)

About

A new web scraper for scraping the Waseda University syllabus.

Report repository

Releases

Packages

No packages published

Languages

Python 100.0%