This project is a tool to automatically scrape product data from Lazada links, match them to a database and output to a formatted CSV.
We built this during the 42KL SME Innovation Hackathon 2022.
Here's our Presentation.
Here's the Challenge Statement.
- Python
- Packages
- Scrapy | web scraping
- spacy | tokenisation
- rapidfuzz | pattern matching
- pandas | data manipulation
Clone this repo
git clone https://github.com/HackathonScrubs/HargaKedaiMamak
Run setup script
./run.sh setup
Update Lazada links
input_files/lazada_links.csv
Scrape from links
./run.sh scrape
Match and output
./run.sh match
Output CSV will be at
output.csv