Skip to content

shekar-raja/meet-up-dot-com-realtime-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meetup.com Realtime Data Pipeline

Prerequisites:

  • Scala (2.12.6)
  • Python (3.9)
  • Apache Kafka (2.8.0)
  • Apache Spark (3.0.2)
  • Mongo DB (4.4.3)
  • MySQL (8.0..25)

Run:

  • Start Kafka zookeeper and server
  • Start Mysql and MongoDB
  • Navigate to Python/src/producers folder and run kafka_producer.py to push messages to Apache Kafka topic
  • Navigate to Scala/src/main/scala folder and run spark_stream_processing_app to start consuming from the Apacha Kafka topic
  • After Spark app deployed to cluster it will start storing RAW JSON objects Dataframes to Mongo DB and aggregated data to MySQL table in batch mode

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published