This self-project built a real-time data platform using Apache Kafka and Django. Highlights include:
- Scalable data processing: Kafka handles millions of events efficiently via horizontal scaling.
- Modular architecture: Plug-in-play design allows for easy integration of new functionalities.
- Robust monitoring: Error handling and logging ensure smooth operation and valuable insights.
This project demonstrates expertise in event-driven architecture, containerization, and real-time data processing.
- You will need to have docker, docker-compose and Python installed
- Start Zookeper Container and expose PORT
2181
.
docker run -p 2181:2181 zookeeper
- Start Kafka Container, expose PORT
9092
and setup ENV variables.
docker run -p 9092:9092 \
-e KAFKA_ZOOKEEPER_CONNECT=<PRIVATE_IP>:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://<PRIVATE_IP>:9092 \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
confluentinc/cp-kafka
- Clone or download this repo and inside the repo modify the settings.py file to add your IP Address in Kafka configuration at this path
cd path/to/repo/DataPlatform/DataPlatform/settings.py
- kafka Configuration
- Now run docker compose, docker image of the current project will build and Postgres (Database used) docker image will be fetched
cd path/to/repo/DataPlatform
docker-compose up
- Run this command to start the bash inside the DataPlatform docker container
docker exec -it <dataplatform-container-id> bash
- The Django server is up, you can test the endpoints using POSTMAN or any other API testers
- For BusinessRule endpoint, start the Queue listener (Kafka Consumer) using following command(inside the container)
python manage.py launch_br_queue_listener
- Now, all the messages/requests POSTED to the endpoint will be recieved and processed by the consumer.
- You can check the logs in the
/app/logs
directory inside the docker container.
- For BusinessRule endpoint, start the Queue listener (Kafka Consumer) using following command(inside the container)
- For using the GoogleSheetsIntegration endpoint you will need to download the credentials.json and create the token.json files following this guide
- Save the credentials.json and token.json in the Django project directory and modify the
config_files/config.json
to add you Google Sheet details where the data will be stored. - Start the googlesheetsintegration queue listener (Kafka Consumer) using following command (inside the container)
python manage.py launch_gsheet_queue_listener
- Thats all, data POSTED on the particular endpoint will be stored in the google sheet, check the logs if needed.
- Thank You