Flink Sensor Processing

A real-time data processing pipeline built with Apache Flink that analyzes factory sensor data. This project handles both streaming and batch processing of industrial machine data, with PostgreSQL for storage and Docker for easy deployment.

What This Does

This project processes sensor data from factory machines in real-time. I built it to learn Apache Flink and explore how to handle both streaming data (live sensor readings) and batch data (historical analysis).

The system simulates factory sensor data and runs two types of analysis:

Stream processing: Analyzes data as it comes in, detecting anomalies in real-time
Batch processing: Processes historical data for trends and insights

Everything runs in Docker containers, so you can spin up the entire infrastructure with one command.

Components

Flink cluster: 1 JobManager + 3 TaskManagers for distributed processing
PostgreSQL: Stores the processed results
Data generator: Creates realistic sensor data for testing
PyFlink jobs: The actual data processing code

Features

Processes sensor data in real-time as it streams in
Handles batch processing for historical data analysis
Detects machine faults based on temperature, vibration, and sound thresholds
Scales across multiple Flink task managers
Generates realistic factory sensor data for testing
Stores results in PostgreSQL for further analysis
Everything containerized with Docker

How It Works

Stream Processing

The data generator sends sensor readings through a socket connection to Flink. The streaming job continuously aggregates data by machine type and flags potential issues when temperature > 75°C, vibration > 20mm/s, or sound > 80dB.

Batch Processing

For historical analysis, the system generates a large CSV file with sensor data, then processes it in batch mode to calculate averages and detect patterns across different machine types.

Both pipelines save their results to PostgreSQL tables for analysis.

Tech Stack

Apache Flink 1.18.1 - Stream and batch processing
PyFlink - Python API for Flink
PostgreSQL 15 - Database for results
Docker & Docker Compose - Containerization
Python 3.9+ - Data generation and processing
Pandas & NumPy - Data manipulation

Requirements

Docker Desktop or Docker Engine
Docker Compose
At least 8GB RAM (recommended)
About 10GB free disk space

Getting Started

Clone the project

git clone https://github.com/MbarekTech/Flink-sensor-processing.git
cd Flink-sensor-processing

Start everything

docker-compose up -d

This starts:

Flink Job Manager (Web UI at http://localhost:8081)
3 Flink Task Managers
PostgreSQL database (port 5435)
PyFlink client container
Data generator services

Run the streaming job

docker exec -it flink-client-pyflink bash
python /flink-job/streaming_sensor_analyzer.py

Run the batch job

# In the same container
python /flink-job/batch_sensor_analyzer.py

Check the results

Flink Web UI: http://localhost:8081
Database: localhost:5435, database: flink_db, user: flink_user, password: flink_password

Project Structure

Flink-sensor-processing/
├── docker-compose.yml          # Sets up all the containers
├── Dockerfile.client           # PyFlink client environment
├── Dockerfile.generator        # Data generator environment
├── requirements.txt            # Python dependencies
├── simulate_factory_data.py    # Generates sensor data
├── demo.ps1                   # PowerShell demo script
├── flink-job/                 # Processing jobs
│   ├── streaming_sensor_analyzer.py  # Real-time processing
│   └── batch_sensor_analyzer.py      # Batch processing
├── data/                      
│   └── factory_sensor_simulator_2040.csv  # Template data
├── unused/                    # Old files
└── README.md

Configuration

You can change these environment variables:

Variable	Default	What it does
`FLINK_HOST`	`jobmanager`	Where Flink is running
`FLINK_PORT`	`9000`	Port for streaming data
`MODE`	`stream`	Generator mode (stream/batch)

Sensor Data

The system tracks these metrics from factory machines:

Machine ID and type
Operating hours, temperature, vibration levels
Sound levels, power consumption
Maintenance history and error counts

Use Cases

This could be useful for:

Detecting machine problems before they cause downtime
Monitoring factory equipment in real-time
Analyzing machine performance over time
Finding patterns in production data
Optimizing power usage across machines

What Gets Analyzed

The system calculates:

Average temperature, vibration, and sound by machine type
Power consumption patterns
How many machines of each type are running
Count of machines that exceed safety thresholds

Troubleshooting

Out of memory errors: Give Docker more RAM (8GB minimum)

Port conflicts: Make sure ports 8081, 5435, and 9000 aren't being used

Jobs won't start: Wait for PostgreSQL to finish starting up, then try again

Can't connect to Flink: Check if the cluster is running at http://localhost:8081

Useful commands

# Check if containers are running
docker-compose ps

# See what's happening
docker-compose logs flink-jobmanager
docker-compose logs postgres

# Connect to the database
docker exec -it flink-postgres psql -U flink_user -d flink_db

Contributing

Feel free to open issues or submit pull requests if you find bugs or have ideas for improvements.

License

MIT License - see LICENSE for details.

Thanks

Built with Apache Flink, PostgreSQL, and Docker. Thanks to those communities for making great tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flink Sensor Processing

What This Does

Components

Features

How It Works

Stream Processing

Batch Processing

Tech Stack

Requirements

Getting Started

Clone the project

Start everything

Run the streaming job

Run the batch job

Check the results

Project Structure

Configuration

Sensor Data

Use Cases

What Gets Analyzed

Troubleshooting

Useful commands

Contributing

License

Thanks

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
flink-job		flink-job
.gitignore		.gitignore
Dockerfile.client		Dockerfile.client
Dockerfile.generator		Dockerfile.generator
LICENSE		LICENSE
README.md		README.md
demo.ps1		demo.ps1
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
simulate_factory_data.py		simulate_factory_data.py

License

MbarekTech/Flink-sensor-processing

Folders and files

Latest commit

History

Repository files navigation

Flink Sensor Processing

What This Does

Components

Features

How It Works

Stream Processing

Batch Processing

Tech Stack

Requirements

Getting Started

Clone the project

Start everything

Run the streaming job

Run the batch job

Check the results

Project Structure

Configuration

Sensor Data

Use Cases

What Gets Analyzed

Troubleshooting

Useful commands

Contributing

License

Thanks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages