Tripadvisor Crawler

This project contains python source code to crawl hotel reviews for tripadvisor.ie

How to Run:

First time setup python with virtualenv

virtualenv --python=python2.7 .
source bin/activate
pip install -r requirements.txt

Run Hotel URL scrapper

scrapy crawl hspider

Run review scrapper

scrapy crawl taspider -a urls=result/urls.txt

Warning

as of 59d7be388bb1592b2f9b2e5ddc787ec6d3eacf5c the urls.txt and *.jl results are set to append mode. Do be careful running multiple times of crawler because the result will be appended instead of overwritten. It is advised at the moment that you manually checkpoint and save scrapped data to prevent duplicated result.

Legacy Content

From the terminal, go the the scrapers/tripadvisor/tripadvisorCrawler directory

Run the following command:

    scrapy crawl taspider -a urls=/path/to/file

Remarks

Clearlake Hotel

testing hotel 1, 22 reviews, can get 19 reviews, 3 reviews are from google translate (can not get it)

Palmerstown Lodge

testing hotel 2, 31 reviews, can get 19 reviews, a few reviews are from google translate, and a few of others also can not get for some reason for example, when download the first page, only can get 6 reviews, but if check form browser, there is 10 reviews there, other four review might from its partner

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
tripadvisorCrawler		tripadvisorCrawler
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tripadvisor Crawler

How to Run:

First time setup python with virtualenv

Run Hotel URL scrapper

Run review scrapper

Warning

Legacy Content

Remarks

Clearlake Hotel

Palmerstown Lodge

About

Releases

Packages

Languages

nilbot/tripadvisor-scrapy

Folders and files

Latest commit

History

Repository files navigation

Tripadvisor Crawler

How to Run:

First time setup python with virtualenv

Run Hotel URL scrapper

Run review scrapper

Warning

Legacy Content

Remarks

Clearlake Hotel

Palmerstown Lodge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages