Scrapers to scrape art images and metadata from Wikiart and Wikimedia Commons. The purpose is to help contribute an art dataset to academia for non-commerical machine learning research- for example in image captioning, image generation, or image classiifcation. Metadata of art genre, style, title, and art images will allow for a diverse scope of machine leanring research
Download the selenium ChromeDriver and move it to the repo's root
notebook_wikiart_scraper.ipynb
- Python notebook that carrys out the crawling. Notebook format is useful to debug and develop
python wikiart_scraper.py
- To run the crawler with multiprocessing. Faster to retrieve images and metadata
- bs4 (BeautifulSoup)
- urllib
- selenium
- regex
- tqdm