WikiCrawler

WikiCrawler.py: Pretty inefficient Web Crawler for http://wowwiki.wikia.com/
save_file() makes a copy of the global set, and saves that to the file 'items.txt'
recurse_page(link) takes a string and will recurse through the basepage/wiki/link
FixData.py: Crappy hack that just removes items from 'items.txt' that have a '#' or ':'
Also makes it lowercase, and writes to 'fixed_items.txt'
SplitData.py: Takes all the items, removes the newline character, then splits them by '_'
Keeps the phrases with the underscores, but also adds the seperated ones too
Writes to 'split_items.txt'

The node.js Web Server is currently using 'split_items.txt'

Let me know if you have any ideas / improvements. Feel free to fork or merge.

Note: Use Python3 and run 'pip install -r requirements.txt' from the main directory

TODO: Comment code, add stuff, fix bad stuff

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
FixData.py		FixData.py
Merge.py		Merge.py
README.md		README.md
SplitData.py		SplitData.py
WikiCrawler.py		WikiCrawler.py
fixed_items.txt		fixed_items.txt
items.txt		items.txt
merge.txt		merge.txt
requirements.txt		requirements.txt
split_items.txt		split_items.txt
wow.txt		wow.txt

Provide feedback