Skip to content

wengzehang/DD2476-Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DD2476-Project: Shopping System with Craigslist Product

Note: The folder crawler is deprecated. All code are move into website folder. Be carefult to configure the path if you want to rerun the code.

Project Main features:

  • Text Search Mode
  • Facets search
  • Search statistic
  • Spell check
  • Price interval, different rank rules, filter rules
  • Image Search Mode
  • Combination Search Mode: Text with Image
  • Smart Recommendation System

Preparation

0 prerequisites

  • Ubuntu 16.04
  • Python 2.7
  • Pytorch, torchvision
  • Elasticsearch And then install all the other python dependencies using pip:
pip install -r pip_list.txt

1 Craw the raw data without images

Use crawler.py to craw data from Craigslist website, totally 35 classes (~80,000 products) Files will be stored in website/data/ file. The raw data tar file is also provided.

2 Craw images with the help of mpi for speeding up

Use crawImages_mpi.py to craw the images and store in local computer. The download log is stored in downloadImages.log. The data will be stored in website/images/. The filtered new data json will be also generated and stored.

3 Extract CNN features using Resnet (Pytorch)

Use extractFeatureFinal.py. The features are stored here. Put the file totalRes18feat.txt under website/

Insert Data and run elasticsearch

python insert.py
elasticsearch # remember add PATH in ~/.bashrc

Run the server

Link: localhost:8080/

python server.py

Authorship

Ruiyang Ma, Zesen Wang, Zehang Weng, Zitao Zhang

About

DD2476 Project Craigslist price recommendations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 44.2%
  • Python 32.6%
  • HTML 20.3%
  • CSS 2.9%