A comprehensive Food & Beverage (F&B) batch process anomaly detection system that combines traditional machine learning with deep learning techniques to identify quality issues, equipment malfunctions, and process deviations in food production.
This system analyzes production batch data to detect anomalies in critical process parameters such as ingredient quantities, temperatures, mixing speeds, and oven conditions. It employs an ensemble approach with four specialized anomaly detection algorithms and a consensus voting mechanism for robust, reliable detection.
Key Objectives:
- Real-time quality monitoring for food batch production
- Automated anomaly flagging with explainable insights
- Multi-model consensus for production-grade reliability
- Executive-ready reporting for stakeholders
The core implementation is in F&B_Process_Anomaly_Detection_System.ipynb, which processes the provided dataset.xlsx to detect anomalies across 11 key process parameters.
- Loading: Excel file import with pandas
- Cleaning: Drop unnamed/ID columns, NaN handling (mean imputation), numeric feature selection
- Preprocessing: StandardScaler normalization
- Dataset: 1500 batches × 11 features (Time, ingredient quantities, temperatures, speeds, humidity)
| Model | Algorithm | Method | Key Parameters | Strengths |
|---|---|---|---|---|
| Isolation Forest | Tree-based Ensemble | sklearn.ensemble.IsolationForest |
contamination=0.1, random_state=42 |
Fast, general-purpose, handles high dimensions |
| One-Class SVM | Boundary-based | sklearn.svm.OneClassSVM |
nu=0.1, kernel='rbf', gamma='scale' |
Clear decision boundaries, robust to noise |
| Local Outlier Factor | Density-based | sklearn.neighbors.LocalOutlierFactor |
n_neighbors=20, contamination=0.1 |
Detects local anomalies, density patterns |
| Autoencoder | Deep Learning | TensorFlow/Keras | ReLU, Dropout(0.2), Adam, MSE loss | Complex pattern recognition, subtle anomalies |
- Mechanism: Majority voting (≥3 models must agree)
- Output: Binary anomaly flags with confidence scores
- Reliability: Reduces false positives by 6-15% vs. single models
- Dimensionality Reduction: PCA & t-SNE for visualization
- Explainability: SHAP values for feature importance
- Metrics: ROC-AUC, Precision-Recall curves, confusion matrices
- Visualization: Interactive Plotly dashboards
F-B-Process-Anomaly-Detection-System/
├── 📄 README.md # Project documentation
├── 📄 LICENSE # MIT License
├── 📊 dataset.xlsx # Sample batch data (1500 batches)
├── 📓 F&B_Process_Anomaly_Detection_System.ipynb # Main analysis notebook
└── 📝 requirements.txt # Python dependencies
The notebook is organized into 7 sequential sections:
# Core libraries: pandas, numpy, matplotlib, seaborn
# ML: sklearn (IsolationForest, OneClassSVM, LocalOutlierFactor)
# DL: tensorflow.keras (Autoencoder)
# Explainability: shap
# Visualization: plotly, seabornprocessor = DataProcessor("dataset.xlsx")
features, data = processor.load_and_clean_data() # 1500×11 → 1500×11
X_scaled = processor.scale_features() # StandardScalerdetector = AnomalyDetector(X_scaled, contamination=0.1)
detector.fit_isolation_forest() # 🌲 Tree-based
detector.fit_ocsvm() # 🔵 Boundary-based
detector.fit_lof() # 🎯 Density-based
detector.fit_autoencoder(epochs=100) # 🧠 Deep Learning# Consensus voting: 3/4 models must agree
results_df = generate_consensus_results(detector.predictions)
# Output: anomaly flags, scores, probabilities for all models# 4-panel Plotly dashboard:
# - Model comparison scatter plot
# - Score distributions histogram
# - Agreement matrix heatmap
# - Feature importance bar chart
create_interactive_dashboard(results_df, feature_importance)# Model-agnostic explanations for top anomalies
explainer = shap.Explainer(detector.models['isolation_forest'])
shap_values = explainer(X_scaled[:100]) # Top 100 samplesexecutive_summary = generate_executive_summary()
# Saves: anomaly_results.csv, executive_summary.txt- Consensus Anomalies: 164 (10.9%)
- Model Agreement: 92% on clear cases
- Top Anomalous Features: Oven Temp (C), Mixing Temp (C), Yeast (kg)
| File | Description |
|---|---|
anomaly_results.csv |
Detailed predictions from all 5 models |
executive_summary.txt |
Stakeholder-ready report |
dashboard.html |
Interactive Plotly visualization |
- Autoencoder: Best at subtle anomalies (small deviations)
- Isolation Forest: Fastest inference (<0.1s for 1500 samples)
- Consensus: Highest reliability (F1-score: 0.87)
- Deploy Isolation Forest for real-time monitoring
- Use Autoencoder for nightly deep analysis
- Alert thresholds: Consensus score > 0.7
- Investigate: Oven temperature deviations (most common anomaly)
This project is licensed under the MIT License - see the LICENSE file for details.