Scrape Art

Scrapers to scrape art images and metadata from Wikiart and Wikimedia Commons. The purpose is to help contribute an art dataset to academia for non-commerical machine learning research- for example in image captioning, image generation, or image classiifcation. Metadata of art genre, style, title, and art images will allow for a diverse scope of machine leanring research

How to run

Download the selenium ChromeDriver and move it to the repo's root
notebook_wikiart_scraper.ipynb- Python notebook that carrys out the crawling. Notebook format is useful to debug and develop
python wikiart_scraper.py- To run the crawler with multiprocessing. Faster to retrieve images and metadata

Requirements

bs4 (BeautifulSoup)
urllib
selenium
regex
tqdm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scrape Art

How to run

Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scrape Art

How to run

Requirements