A program to get all StockX Product URL's and then scrape the info and images from each given URL. This is currently a work in progress.
These scripts allow a 3-step approach for:
-
Getting all URL's by brand, make, and year
- Currently, I split up brands vs loop through all due to time constraint. Outputs 3 files for Nike, adi, and Jordan
-
From the list of URL's scrape all information in 3 steps. I broke out page and sales history components:
- Currently, looping through brands individually (each of the 3 files from above) vs all combined
- Step 1: Use
02_product_scraper_no_history.py
to collect page level information for each product URL - Step 2: Use
03_product_scraper_history_only.py
to collect individual product level sales history found within sales history pop up tab - Step 3: Use
04_image_scraper.py
to collect all images from product pages (added this step post-scrape unfortunately).
Currently, the process to scrape all URL's and product info is time-consuming. Also, there is an occasional "Verify" button that may come up that I've manually clicked for now just to bypass and keep moving forward.
The product sales history script 03_product_scraper_history_only.py
is also very time-consuming due to some products having A LOT of transaction history.
time
json
pandas
bs4
selenium