Meetup.com Realtime Data Pipeline

Prerequisites:

Scala (2.12.6)
Python (3.9)
Apache Kafka (2.8.0)
Apache Spark (3.0.2)
Mongo DB (4.4.3)
MySQL (8.0..25)

Start Kafka zookeeper and server
Start Mysql and MongoDB
Navigate to Python/src/producers folder and run kafka_producer.py to push messages to Apache Kafka topic
Navigate to Scala/src/main/scala folder and run spark_stream_processing_app to start consuming from the Apacha Kafka topic
After Spark app deployed to cluster it will start storing RAW JSON objects Dataframes to Mongo DB and aggregated data to MySQL table in batch mode

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.bsp		.bsp
Python		Python
scala		scala
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md