experiments (visualization) are done under jupyter lab = v4.4.0
# for data analysis
python >= 3.x
numpy
pandas
# for crawler
analysis.ipynb
crawling over movieQA genre is done and summarized as imdb_crawled_whole.json
file.
- genre information of each movie follows imdb classification
- movie lists from movieQA (imdb keys starting with "tt"):
id_ds_splits.json
crawler_main.ipynb
crawler is implemented with bs4 and python requests lib.
Thx to sigran0 for many helps
https://beomi.github.io/gb-crawling/posts/2017-01-20-HowToMakeWebCrawler.html http://docs.python-requests.org/en/master/user/quickstart/#make-a-request https://github.com/sigran0/LyricScrapper/blob/master/Scrapper/MelonLyricScrapper.py