This is my approach to Sapient Talent Hunt for Data Engineers challenge which was hosted on Analytics Vidhya. I secured second rank in this challenge.
In this challenge, I had to generate alerts based on sensor data. Detailed problem statement is given here. Basically, sensors are generating data per minute. I had to consume this data in streaming fashion and generate two kinds of alerts on it. Use of a kafka component reading data from csv file and sending it to any streaming engine was compulsory.
Softwares Used:
- NiFi
- Kafka
- Spark (Streaming and Batch)
- Parquet
Please go through following files:
- Problem Statement : This file contains problem statement as well as data description.
- Data Pipeline Document : It has detailed information about pipeline such as data flow diagram, preprocessing, null value imputation and future scope.