Skip to content

Real-time IoT sensor data processing pipeline using Apache Flink, PostgreSQL, and Docker for industrial machine monitoring

License

Notifications You must be signed in to change notification settings

MbarekTech/Flink-sensor-processing

Repository files navigation

Flink Sensor Processing

License: MIT Python Apache Flink Docker

A real-time data processing pipeline built with Apache Flink that analyzes factory sensor data. This project handles both streaming and batch processing of industrial machine data, with PostgreSQL for storage and Docker for easy deployment.

What This Does

This project processes sensor data from factory machines in real-time. I built it to learn Apache Flink and explore how to handle both streaming data (live sensor readings) and batch data (historical analysis).

The system simulates factory sensor data and runs two types of analysis:

  • Stream processing: Analyzes data as it comes in, detecting anomalies in real-time
  • Batch processing: Processes historical data for trends and insights

Everything runs in Docker containers, so you can spin up the entire infrastructure with one command.

Components

  • Flink cluster: 1 JobManager + 3 TaskManagers for distributed processing
  • PostgreSQL: Stores the processed results
  • Data generator: Creates realistic sensor data for testing
  • PyFlink jobs: The actual data processing code

Features

  • Processes sensor data in real-time as it streams in
  • Handles batch processing for historical data analysis
  • Detects machine faults based on temperature, vibration, and sound thresholds
  • Scales across multiple Flink task managers
  • Generates realistic factory sensor data for testing
  • Stores results in PostgreSQL for further analysis
  • Everything containerized with Docker

How It Works

Stream Processing

The data generator sends sensor readings through a socket connection to Flink. The streaming job continuously aggregates data by machine type and flags potential issues when temperature > 75°C, vibration > 20mm/s, or sound > 80dB.

Batch Processing

For historical analysis, the system generates a large CSV file with sensor data, then processes it in batch mode to calculate averages and detect patterns across different machine types.

Both pipelines save their results to PostgreSQL tables for analysis.

Tech Stack

  • Apache Flink 1.18.1 - Stream and batch processing
  • PyFlink - Python API for Flink
  • PostgreSQL 15 - Database for results
  • Docker & Docker Compose - Containerization
  • Python 3.9+ - Data generation and processing
  • Pandas & NumPy - Data manipulation

Requirements

  • Docker Desktop or Docker Engine
  • Docker Compose
  • At least 8GB RAM (recommended)
  • About 10GB free disk space

Getting Started

Clone the project

git clone https://github.com/MbarekTech/Flink-sensor-processing.git
cd Flink-sensor-processing

Start everything

docker-compose up -d

This starts:

  • Flink Job Manager (Web UI at http://localhost:8081)
  • 3 Flink Task Managers
  • PostgreSQL database (port 5435)
  • PyFlink client container
  • Data generator services

Run the streaming job

docker exec -it flink-client-pyflink bash
python /flink-job/streaming_sensor_analyzer.py

Run the batch job

# In the same container
python /flink-job/batch_sensor_analyzer.py

Check the results

  • Flink Web UI: http://localhost:8081
  • Database: localhost:5435, database: flink_db, user: flink_user, password: flink_password

Project Structure

Flink-sensor-processing/
├── docker-compose.yml          # Sets up all the containers
├── Dockerfile.client           # PyFlink client environment
├── Dockerfile.generator        # Data generator environment
├── requirements.txt            # Python dependencies
├── simulate_factory_data.py    # Generates sensor data
├── demo.ps1                   # PowerShell demo script
├── flink-job/                 # Processing jobs
│   ├── streaming_sensor_analyzer.py  # Real-time processing
│   └── batch_sensor_analyzer.py      # Batch processing
├── data/                      
│   └── factory_sensor_simulator_2040.csv  # Template data
├── unused/                    # Old files
└── README.md                  

Configuration

You can change these environment variables:

Variable Default What it does
FLINK_HOST jobmanager Where Flink is running
FLINK_PORT 9000 Port for streaming data
MODE stream Generator mode (stream/batch)

Sensor Data

The system tracks these metrics from factory machines:

  • Machine ID and type
  • Operating hours, temperature, vibration levels
  • Sound levels, power consumption
  • Maintenance history and error counts

Use Cases

This could be useful for:

  • Detecting machine problems before they cause downtime
  • Monitoring factory equipment in real-time
  • Analyzing machine performance over time
  • Finding patterns in production data
  • Optimizing power usage across machines

What Gets Analyzed

The system calculates:

  • Average temperature, vibration, and sound by machine type
  • Power consumption patterns
  • How many machines of each type are running
  • Count of machines that exceed safety thresholds

Troubleshooting

Out of memory errors: Give Docker more RAM (8GB minimum)

Port conflicts: Make sure ports 8081, 5435, and 9000 aren't being used

Jobs won't start: Wait for PostgreSQL to finish starting up, then try again

Can't connect to Flink: Check if the cluster is running at http://localhost:8081

Useful commands

# Check if containers are running
docker-compose ps

# See what's happening
docker-compose logs flink-jobmanager
docker-compose logs postgres

# Connect to the database
docker exec -it flink-postgres psql -U flink_user -d flink_db

Contributing

Feel free to open issues or submit pull requests if you find bugs or have ideas for improvements.

License

MIT License - see LICENSE for details.

Thanks

Built with Apache Flink, PostgreSQL, and Docker. Thanks to those communities for making great tools.

About

Real-time IoT sensor data processing pipeline using Apache Flink, PostgreSQL, and Docker for industrial machine monitoring

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published