Skip to content

The repo contains a Hadoop cluster configuration and a client-server app. The goal is to predict smartphone's price range using a machine learning model generated over Apache Spark, and visualize charts about smarphone statistics using data originated by Apache Hive.

Notifications You must be signed in to change notification settings

maioranisimone/Hadoop-SmartPhone-Prediction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop SmartPhone Prediction

Authors

Introduction

The repo contains a Hadoop cluster configuration and a client-server app. The goal is to predict smartphone's price range using a machine learning model generated over Apache Spark, and visualize charts about smarphone statistics using data originated by Apache Hive.

This image represents one possible output result.

Hadoop Configuration

The application is tested in both local and cluster modes.
We used ZeroTier to connect our two machines.

These configuration are available as README to:

How to run it

For run the project you need these packages:

How to do it

First, you need to download JPMML-SparkML jar and add it to Spark jars folder. After downloading it, you can install PySpark2PMML. Before downloading and installing all pmml libraries please check which Spark version you have, you can check on pmml documentation which version is compatible with yours. Remember to install numpy too, because is used by PysparkPMML.

The project use Google Chrome to open the client because it consents to manage CORS policy.

Change paths

In both cluster and local folders, you can find a folder named script, inside there are the main scripts for launch the project.
These scripts are:

  • runHadoop.sh
  • runHive.sh
  • runSparkAPP.sh
  • runProject.sh

You need to run just runProject.sh, but before running it you should change all paths containend in the other scripts, like Hadoop, Spark and Hive paths and Python path too.

About

The repo contains a Hadoop cluster configuration and a client-server app. The goal is to predict smartphone's price range using a machine learning model generated over Apache Spark, and visualize charts about smarphone statistics using data originated by Apache Hive.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 54.9%
  • JavaScript 18.1%
  • Python 14.2%
  • Shell 12.8%