Skip to content

hooyunzhe/HargaKedaiMamak

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Priceshop | 42KL SME Innovation Hackathon

This project is a tool to automatically scrape product data from Lazada links, match them to a database and output to a formatted CSV.

We built this during the 42KL SME Innovation Hackathon 2022.
Here's our Presentation.
Here's the Challenge Statement.

How it's built

  • Python
  • Packages
    • Scrapy | web scraping
    • spacy | tokenisation
    • rapidfuzz | pattern matching
    • pandas | data manipulation

How it works

Scraper

Matcher

Setup and run

Clone this repo

git clone https://github.com/HackathonScrubs/HargaKedaiMamak

Run setup script

./run.sh setup

Update Lazada links

input_files/lazada_links.csv

Scrape from links

./run.sh scrape

Match and output

./run.sh match

Output CSV will be at

output.csv

Team

References

About

Web scraper and matcher for tech products

Resources

Stars

Watchers

Forks

Languages

  • Python 98.7%
  • Shell 1.3%