Yelp insights challenge

Application finding the list of businesses reviewed by Yelp users having higher influence in Yelp social network

Spark job accepting Yelp dataset in TSV format as an input

running Page Rank algorithm to find top 20 high influencers
running query to find the businesses reviewed
writing results to the disk

Initial Yelp dataset has JSON format, to transform it to TSV (tab delimited) run script

$ python scripts/convertJsonToTsv.py yelp_academic_dataset.json # Creates yelp_academic_dataset.tsv

How to build and run the application

How to build application with Maven

mvn clean verify

How to build a Docker image

docker build -t dgreenshtein/yelp-insights ${PROJECT_HOME}

How to run application Docker container and start Spark cluster

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h insights --name=insights dgreenshtein/yelp-insights /bin/bash

How to run Spark job

# start Spark Master and Worker
root@insights$ /etc/bootstrap.sh

root@insights$ cd /opt/yelp-insights/

# to run application with test data set
root@insights$ scripts/start-job.sh test-data/business.tsv test-data/reviews.tsv test-data/users.tsv /opt/yelp-insights/results/

Spark Master Web UI http://localhost:8080

Tools and versions

Spark SQL, GraphX 2.1.1
Graphframes 0.5.0
Pandas Dataframe
Docker 17.05.0-ce

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bootstrap.sh		bootstrap.sh
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yelp insights challenge

How to build and run the application

Tools and versions

References

About

Releases

Packages

Languages

License

dgreenshtein/yelp-insights

Folders and files

Latest commit

History

Repository files navigation

Yelp insights challenge

How to build and run the application

Tools and versions

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages