Spark_Scala_intro

create build.sbt

create this folder : src/main/scala/

put the program and cd ../../..

run : sbt clean //clean last build and create target and .. folders

run: sbt package //it compiles)

to run the app : spark-submit --class Habib_Boloorchi_Program_3 ./target/scala-2.10/habib_boloorchi_program_3_2.10-1.0.jar

Project Definition:

1.For all vehicles, write a Sparkjob to get the average annual petroleum consumption in barrels for fuelType1for each makeof vehicle.

2.Write a Spark job using the cubefunction to get the the average annual petroleum consumption in barrels for fuelType1 for each make of vehicle.

3.Write a Spark job using SQL to find the total number of makes of vehicles for all years that run on different types of fuelType1.

4.Stream twitter datainto Spark for 3hours. After 3 hours the streaming should automatically stop. Identify three topic keywords. As the data is streaming in, based on the 3 topic keywords, count the number of times the keywords appear in the tweets. Do this whenever a new file is input. Output the count for each keyword. Caution: Do not use common words.

5.Write a Spark trigger basedon the processing time. Execute the streaming query of No. 4above at regular 10 minute intervals.After 3 hours the streaming should automatically stop.

How to catch Twitter data in flume:

Create and sign in twitter, create configuration file I put a conf file example in the repositort. and run it by this command in HDFS:

&nohup $FLUME_HOME/bin/flume-ng agent -n TwitterAgent -f /autohome/file_name.conf &

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Habib_Boloorchi_Program_1.scala		Habib_Boloorchi_Program_1.scala
Habib_Boloorchi_Program_2.scala		Habib_Boloorchi_Program_2.scala
Habib_Boloorchi_Program_3.scala		Habib_Boloorchi_Program_3.scala
Habib_Boloorchi_Program_4.scala		Habib_Boloorchi_Program_4.scala
Habib_Boloorchi_Program_5.scala		Habib_Boloorchi_Program_5.scala
README.md		README.md
build.sbt		build.sbt
flume.conf		flume.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark_Scala_intro

About

Releases

Packages

Languages

Araas-Boloorchi/Spark_Scala_intro

Folders and files

Latest commit

History

Repository files navigation

Spark_Scala_intro

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages