diff --git a/TP1/TP1.md b/TP1/TP1.md deleted file mode 100644 index 4d2a719..0000000 --- a/TP1/TP1.md +++ /dev/null @@ -1,113 +0,0 @@ -# 💡 AI for IoT Practical 1: Fire Alarm Detection - -📌 **Important:** Please run this practical work on **[Kaggle Notebooks](https://www.kaggle.com/code)**. -Kaggle provides free GPU/CPU resources, pre-installed libraries (scikit-learn, XGBoost, etc.), and easy dataset integration. - ---- - -## 1. Objective -The goal of this practical is to build a **binary classification model** to predict the state of an IoT-connected smoke detector based on environmental sensor readings. -You'll start with the fundamental **Logistic Regression** model and have the opportunity to implement the high-performance **XGBoost** model as a bonus. - ---- - -## 2. Dataset Overview -This dataset simulates real-world conditions encountered by an IoT fire detection device. - -- **Source**: [Kaggle - Smoke Detection Dataset](https://www.kaggle.com/datasets/deepcontractor/smoke-detection-dataset) -- **Key Features (Input/X):** - Time-series readings from sensors like: - - Temperature - - Humidity - - Gas concentrations (CO, LPG, Methane) - - Other environmental factors - -- **Target Feature (Output/y):** - - `1` → Fire Alarm is **ON** (Fire/Smoke detected) - - `0` → Normal operational conditions (No Fire/Smoke) - ---- - -## 3. Core Task: Logistic Regression Classifier - -Logistic Regression is an excellent starting point for binary classification, providing a good performance baseline for an AIoT application. - -### A. Setup and Preprocessing -1. **Import Libraries** - - `pandas`, `numpy`, `matplotlib` / `seaborn` - - From `sklearn`: `train_test_split`, `LogisticRegression`, `metrics` - -2. **Load and Inspect Data** - - Load the dataset from Kaggle - - Use `.info()` and `.isnull().sum()` to check for missing values - - Handle missing data (imputation or removal) - -3. **Define Features and Target** - - Separate features (`X`) and target variable (`y`) - -4. **Feature Scaling (MANDATORY)** - - Apply `StandardScaler` from `sklearn.preprocessing` - - Logistic Regression requires scaled features - -5. **Split Data** - - Use `train_test_split` with 80% training and 20% testing - ---- - -### B. Training and Evaluation -1. **Train Model** - - Initialize and train a `LogisticRegression` model - -2. **Make Predictions** - - Predict outcomes on the test set - -3. **Evaluate Performance** - - Calculate metrics: - - Accuracy - - Precision - - Recall - - F1-Score - -4. **Visualize Results** - - Generate and display a **Confusion Matrix** - ---- - -## 🏆 Bonus Challenge: Building a Robust XGBoost Model - -XGBoost (Extreme Gradient Boosting) is one of the most powerful ensemble techniques used in industry. - -1. **Implement XGBoost** - - Use `XGBClassifier` from the `xgboost` library - - Note: Tree-based models like XGBoost are not highly sensitive to feature scaling - -2. **Train and Evaluate** - - Train on the same train/test split - - Calculate the same metrics as Logistic Regression - -3. **Compare and Analyze** - - Which model has a higher **Recall** score? - - Why is minimizing **False Negatives** crucial in fire detection? - - Discuss trade-offs: - - Logistic Regression → simplicity & speed - - XGBoost → higher performance but more complexity - ---- - -## 4. Deliverables - -Submit a **single Kaggle Notebook (`.ipynb`)** containing: - -- **Code and Documentation** - - Well-commented code with all preprocessing and modeling steps - -- **Core Task Results** - - Accuracy, Precision, Recall, F1-Score, Confusion Matrix for Logistic Regression - -- **Conclusion** - - Interpret the Recall score of Logistic Regression - - Discuss implications for IoT fire alarm reliability - -- **Bonus Results (if completed)** - - XGBoost metrics - - Comparative analysis between Logistic Regression and XGBoost diff --git a/TP1_AIoT.ipynb b/TP1_AIoT.ipynb new file mode 100644 index 0000000..fbfdba1 --- /dev/null +++ b/TP1_AIoT.ipynb @@ -0,0 +1,818 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "_w3y4VumlGgB" + }, + "outputs": [], + "source": [ + "try:\n", + " import xgboost # noqa: F401\n", + " XGB_AVAILABLE = True\n", + "except Exception:\n", + " XGB_AVAILABLE = False\n", + "\n", + "if not XGB_AVAILABLE:\n", + " try:\n", + " # In Kaggle/Colab, this should succeed. If it fails (e.g., offline), you can skip the bonus.\n", + " !pip -q install xgboost\n", + " import xgboost # noqa: F401\n", + " XGB_AVAILABLE = True\n", + " except Exception as e:\n", + " print(\"⚠️ Could not install xgboost automatically. You can proceed without the bonus.\")\n", + " print(e)\n" + ] + }, + { + "cell_type": "code", + "source": [ + "import os, re, glob, warnings\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.metrics import (\n", + " accuracy_score, precision_score, recall_score, f1_score,\n", + " confusion_matrix, ConfusionMatrixDisplay, classification_report\n", + ")\n", + "\n", + "warnings.filterwarnings(\"ignore\")\n", + "RANDOM_STATE = 42\n", + "np.random.seed(RANDOM_STATE)\n", + "\n", + "try:\n", + " from xgboost import XGBClassifier\n", + " XGB_AVAILABLE = True\n", + "except Exception:\n", + " XGB_AVAILABLE = False\n" + ], + "metadata": { + "id": "oB81zUE8lLBx" + }, + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def standardize_columns(df):\n", + " # Lowercase; replace non-alphanum with underscore; strip repeats\n", + " new_cols = []\n", + " for c in df.columns:\n", + " c2 = re.sub(r'[^0-9a-zA-Z]+', '_', c.strip().lower())\n", + " c2 = re.sub(r'_+', '_', c2).strip('_')\n", + " new_cols.append(c2)\n", + " df.columns = new_cols\n", + " return df\n", + "\n", + "def auto_find_csv(search_dir='/kaggle/input'):\n", + "\n", + " if not os.path.exists(search_dir):\n", + " return None\n", + " patterns = [\"*smoke*.csv\", \"*fire*.csv\", \"*alarm*.csv\", \"*iot*.csv\", \"*.csv\"]\n", + " for pat in patterns:\n", + " files = glob.glob(os.path.join(search_dir, \"**\", pat), recursive=True)\n", + " if files:\n", + "\n", + " files_sorted = sorted(files, key=lambda p: (0 if 'smoke' in p.lower() or 'fire' in p.lower() else 1, len(p)))\n", + " return files_sorted[0]\n", + " return None\n", + "\n", + "DATA_PATH = auto_find_csv('/kaggle/input')\n", + "print(\"Auto-detected data path:\", DATA_PATH)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sj1A3LkllO81", + "outputId": "776066e9-e73d-4b67-a1b4-6c3c7c1acd26" + }, + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Auto-detected data path: None\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "if DATA_PATH is None:\n", + " try:\n", + " from google.colab import files # type: ignore\n", + " print(\"🔼 Please choose the CSV file to upload...\")\n", + " up = files.upload()\n", + " if up:\n", + " DATA_PATH = list(up.keys())[0]\n", + " print(\"Using uploaded file:\", DATA_PATH)\n", + " except Exception as e:\n", + " print(\"Upload not available (probably not running in Colab).\", e)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 107 + }, + "id": "Qmf1Q1vQoeMu", + "outputId": "15304af6-bd83-4843-c86d-841e3f033da0" + }, + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "🔼 Please choose the CSV file to upload...\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " Upload widget is only available when the cell has been executed in the\n", + " current browser session. Please rerun this cell to enable.\n", + " \n", + " " + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Saving archive (4).zip to archive (4).zip\n", + "Using uploaded file: archive (4).zip\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "\n", + "assert DATA_PATH is not None, \"❌ Dataset not found. Please add the Kaggle dataset via 'Add data' or set DATA_PATH manually.\"\n", + "\n", + "df_raw = pd.read_csv(DATA_PATH)\n", + "df = df_raw.copy()\n", + "df = standardize_columns(df)\n", + "\n", + "print(\"Shape:\", df.shape)\n", + "print(\"\\nColumns:\", list(df.columns))\n", + "print(\"\\nInfo:\")\n", + "print(df.info())\n", + "\n", + "print(\"\\nMissing values per column:\")\n", + "print(df.isnull().sum().sort_values(ascending=False).head(20))\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0vMW08R6orA3", + "outputId": "f7d4af6c-86a3-4b79-e82b-d60351cd7548" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Shape: (62630, 16)\n", + "\n", + "Columns: ['unnamed_0', 'utc', 'temperature_c', 'humidity', 'tvoc_ppb', 'eco2_ppm', 'raw_h2', 'raw_ethanol', 'pressure_hpa', 'pm1_0', 'pm2_5', 'nc0_5', 'nc1_0', 'nc2_5', 'cnt', 'fire_alarm']\n", + "\n", + "Info:\n", + "\n", + "RangeIndex: 62630 entries, 0 to 62629\n", + "Data columns (total 16 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 unnamed_0 62630 non-null int64 \n", + " 1 utc 62630 non-null int64 \n", + " 2 temperature_c 62630 non-null float64\n", + " 3 humidity 62630 non-null float64\n", + " 4 tvoc_ppb 62630 non-null int64 \n", + " 5 eco2_ppm 62630 non-null int64 \n", + " 6 raw_h2 62630 non-null int64 \n", + " 7 raw_ethanol 62630 non-null int64 \n", + " 8 pressure_hpa 62630 non-null float64\n", + " 9 pm1_0 62630 non-null float64\n", + " 10 pm2_5 62630 non-null float64\n", + " 11 nc0_5 62630 non-null float64\n", + " 12 nc1_0 62630 non-null float64\n", + " 13 nc2_5 62630 non-null float64\n", + " 14 cnt 62630 non-null int64 \n", + " 15 fire_alarm 62630 non-null int64 \n", + "dtypes: float64(8), int64(8)\n", + "memory usage: 7.6 MB\n", + "None\n", + "\n", + "Missing values per column:\n", + "unnamed_0 0\n", + "utc 0\n", + "temperature_c 0\n", + "humidity 0\n", + "tvoc_ppb 0\n", + "eco2_ppm 0\n", + "raw_h2 0\n", + "raw_ethanol 0\n", + "pressure_hpa 0\n", + "pm1_0 0\n", + "pm2_5 0\n", + "nc0_5 0\n", + "nc1_0 0\n", + "nc2_5 0\n", + "cnt 0\n", + "fire_alarm 0\n", + "dtype: int64\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "candidate_targets = [\n", + " 'fire_alarm', 'alarm', 'fire', 'target', 'class', 'label', 'smoke_detected'\n", + "]\n", + "\n", + "target_col = None\n", + "for c in candidate_targets:\n", + " if c in df.columns:\n", + " target_col = c\n", + " break\n", + "\n", + "if target_col is None:\n", + " raise ValueError(\n", + " \"❌ Could not auto-detect target column. \"\n", + " \"Please set 'target_col' manually from df.columns.\"\n", + " )\n", + "\n", + "print(f\"Detected target column: {target_col}\")\n", + "\n", + "if df[target_col].dtype == 'O':\n", + "\n", + " mapping = {'yes':1, 'y':1, 'true':1, 'on':1, 'fire':1,\n", + " 'no':0, 'n':0, 'false':0, 'off':0, 'normal':0}\n", + " df[target_col] = df[target_col].astype(str).str.strip().str.lower().map(mapping).astype('Int64')\n", + " if df[target_col].isna().any():\n", + "\n", + " df[target_col] = pd.factorize(df[target_col].astype(str))[0]\n", + "else:\n", + "\n", + " unique_vals = sorted(df[target_col].dropna().unique())\n", + " if not set(unique_vals).issubset({0,1}):\n", + " df[target_col] = (df[target_col] > 0).astype(int)\n", + "\n", + "drop_like = ['id', 'index', 'timestamp', 'time', 'date']\n", + "to_drop = [c for c in df.columns if any(k in c for k in drop_like) and c != target_col]\n", + "\n", + "features = [c for c in df.columns if c != target_col and c not in to_drop]\n", + "X = df[features].copy()\n", + "y = df[target_col].astype(int).copy()\n", + "\n", + "print(f\"Using {len(features)} features. Dropped: {to_drop}\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EJPbV15eozRI", + "outputId": "227a4981-2587-41da-b28d-c0322463d996" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Detected target column: fire_alarm\n", + "Using 14 features. Dropped: ['humidity']\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "\n", + "X_train, X_test, y_train, y_test = train_test_split(\n", + " X, y, test_size=0.2, random_state=RANDOM_STATE, stratify=y\n", + ")\n", + "\n", + "imputer = SimpleImputer(strategy='median')\n", + "X_train_imp = imputer.fit_transform(X_train)\n", + "X_test_imp = imputer.transform(X_test)\n", + "\n", + "print(\"Train shape:\", X_train_imp.shape, \"Test shape:\", X_test_imp.shape)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "KIMvpRX1pAJX", + "outputId": "234d9497-2672-4f24-9cc1-03ed2226f22b" + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Train shape: (50104, 14) Test shape: (12526, 14)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "lr_pipe = Pipeline([\n", + " ('scaler', StandardScaler()),\n", + " ('clf', LogisticRegression(max_iter=1000, solver='lbfgs', random_state=RANDOM_STATE))\n", + "])\n", + "\n", + "lr_pipe.fit(X_train_imp, y_train)\n", + "y_pred_lr = lr_pipe.predict(X_test_imp)\n", + "y_prob_lr = lr_pipe.predict_proba(X_test_imp)[:,1]\n", + "\n", + "metrics_lr = {\n", + " 'Accuracy': accuracy_score(y_test, y_pred_lr),\n", + " 'Precision': precision_score(y_test, y_pred_lr, zero_division=0),\n", + " 'Recall': recall_score(y_test, y_pred_lr, zero_division=0),\n", + " 'F1-Score': f1_score(y_test, y_pred_lr, zero_division=0)\n", + "}\n", + "print(\"=== Logistic Regression Metrics ===\")\n", + "for k,v in metrics_lr.items():\n", + " print(f\"{k}: {v:.4f}\")\n", + "\n", + "cm = confusion_matrix(y_test, y_pred_lr)\n", + "disp = ConfusionMatrixDisplay(confusion_matrix=cm)\n", + "fig = plt.figure()\n", + "disp.plot(values_format='d')\n", + "plt.title(\"Confusion Matrix - Logistic Regression\")\n", + "plt.show()\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 576 + }, + "id": "3pmc_dDJpIcD", + "outputId": "515694a2-7736-4db2-bb86-bf55c00f41b5" + }, + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "=== Logistic Regression Metrics ===\n", + "Accuracy: 0.9796\n", + "Precision: 0.9895\n", + "Recall: 0.9818\n", + "F1-Score: 0.9856\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "metrics_xgb = None\n", + "if XGB_AVAILABLE:\n", + " xgb = XGBClassifier(\n", + " n_estimators=200,\n", + " max_depth=5,\n", + " learning_rate=0.1,\n", + " subsample=0.9,\n", + " colsample_bytree=0.9,\n", + " reg_lambda=1.0,\n", + " tree_method=\"hist\",\n", + " random_state=RANDOM_STATE,\n", + " n_jobs=1,\n", + " eval_metric=\"logloss\"\n", + " )\n", + "\n", + " xgb.fit(X_train_imp, y_train)\n", + " y_pred_xgb = xgb.predict(X_test_imp)\n", + " y_prob_xgb = xgb.predict_proba(X_test_imp)[:,1]\n", + "\n", + " metrics_xgb = {\n", + " 'Accuracy': accuracy_score(y_test, y_pred_xgb),\n", + " 'Precision': precision_score(y_test, y_pred_xgb, zero_division=0),\n", + " 'Recall': recall_score(y_test, y_pred_xgb, zero_division=0),\n", + " 'F1-Score': f1_score(y_test, y_pred_xgb, zero_division=0)\n", + " }\n", + " print(\"=== XGBoost Metrics ===\")\n", + " for k,v in metrics_xgb.items():\n", + " print(f\"{k}: {v:.4f}\")\n", + "\n", + " cm2 = confusion_matrix(y_test, y_pred_xgb)\n", + " disp2 = ConfusionMatrixDisplay(confusion_matrix=cm2)\n", + " fig = plt.figure()\n", + " disp2.plot(values_format='d')\n", + " plt.title(\"Confusion Matrix - XGBoost\")\n", + " plt.show()\n", + "else:\n", + " print(\"⚠️ XGBoost not available. Skip bonus or install it in the first cell.\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 576 + }, + "id": "A52NSKR0pPxB", + "outputId": "ba1e721b-262f-48bd-ab7f-82e0023783a2" + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "=== XGBoost Metrics ===\n", + "Accuracy: 0.9999\n", + "Precision: 1.0000\n", + "Recall: 0.9999\n", + "F1-Score: 0.9999\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "rows = []\n", + "rows.append({'Model':'LogReg', **metrics_lr})\n", + "if metrics_xgb is not None:\n", + " rows.append({'Model':'XGBoost', **metrics_xgb})\n", + "\n", + "results_df = pd.DataFrame(rows).set_index('Model')\n", + "display(results_df.style.format(\"{:.4f}\"))\n", + "\n", + "best_model_by_recall = results_df['Recall'].idxmax()\n", + "print(f\"Highest Recall: {best_model_by_recall} ({results_df.loc[best_model_by_recall, 'Recall']:.4f})\")\n", + "\n", + "from IPython.display import Markdown, display as md_display\n", + "analysis = f'''\n", + "### Discussion\n", + "- **Recall** focuses on catching as many *actual fires* as possible.\n", + "- In fire detection, **False Negatives** (missed fires) are dangerous and must be minimized.\n", + "- **Logistic Regression**: simple, fast, and interpretable — a solid baseline.\n", + "- **XGBoost**: often achieves higher performance (including recall) but is more complex and heavier.\n", + "- Based on the table above, the model with higher recall here is **{best_model_by_recall}**.\n", + "'''\n", + "md_display(Markdown(analysis))\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 296 + }, + "id": "GtcNC9Tupcjp", + "outputId": "8251872e-a78a-4f38-caa8-974f4d1ca4a8" + }, + "execution_count": 14, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 AccuracyPrecisionRecallF1-Score
Model    
LogReg0.97960.98950.98180.9856
XGBoost0.99991.00000.99990.9999
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Highest Recall: XGBoost (0.9999)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/markdown": "\n### Discussion\n- **Recall** focuses on catching as many *actual fires* as possible. \n- In fire detection, **False Negatives** (missed fires) are dangerous and must be minimized.\n- **Logistic Regression**: simple, fast, and interpretable — a solid baseline. \n- **XGBoost**: often achieves higher performance (including recall) but is more complex and heavier.\n- Based on the table above, the model with higher recall here is **XGBoost**.\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "os.makedirs(\"outputs\", exist_ok=True)\n", + "results_path = os.path.join(\"outputs\", \"metrics_comparison.csv\")\n", + "results_df.to_csv(results_path)\n", + "print(\"Saved metrics to:\", results_path)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "TKrq8PMSplgd", + "outputId": "64706526-7c37-43f7-f476-67a86fd835bd" + }, + "execution_count": 15, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Saved metrics to: outputs/metrics_comparison.csv\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/TP4/TP4_Submission.zip b/TP4/TP4_Submission.zip new file mode 100644 index 0000000..f05c25d Binary files /dev/null and b/TP4/TP4_Submission.zip differ diff --git a/TP4/ai_logic/bench_latency.py b/TP4/ai_logic/bench_latency.py new file mode 100644 index 0000000..6b1c1b3 --- /dev/null +++ b/TP4/ai_logic/bench_latency.py @@ -0,0 +1,29 @@ +import time, json, statistics, sys +import paho.mqtt.client as m +dev="esp-01" +lats=[] +def on(c,u,msg): + try: + j=json.loads(msg.payload.decode()) + if j.get("device_id")!=dev: return + t_ms=int(j.get("t_ms",0)) + if t_ms<=0: return + now=int(time.time()*1000) + lats.append(now - t_ms) + print(f"latency_ms={lats[-1]}", flush=True) + if len(lats)>=10: + c.disconnect() + except Exception as e: + print("err",e, file=sys.stderr) +c=m.Client() +c.on_message=on +c.connect("broker.mqtt.cool",1883,60) +c.subscribe("esp32/data",1) +c.loop_forever() +if lats: + avg=statistics.mean(lats) + p95=sorted(lats)[int(0.95*(len(lats)-1))] + open("report.md","a",encoding="utf-8").write(f"\\n\\n## Latency Results\\n- Samples: {len(lats)}\\n- Avg latency: {avg:.1f} ms\\n- P95 latency: {p95:.1f} ms\\n") + print("Appended latency stats to report.md") +else: + print("No samples captured") diff --git a/TP4/ai_logic/latency_sub.py b/TP4/ai_logic/latency_sub.py new file mode 100644 index 0000000..ca457f8 --- /dev/null +++ b/TP4/ai_logic/latency_sub.py @@ -0,0 +1,12 @@ +import json,time +import paho.mqtt.client as m +def on(c,u,msg): + j=json.loads(msg.payload.decode()) + now=int(time.time()*1000) + t_ms=int(j.get("t_ms",0)) + print(f"seq={j.get('seq')} latency_ms={now - t_ms}", flush=True) +c=m.Client() +c.on_message=on +c.connect("broker.mqtt.cool",1883,60) +c.subscribe("esp32/data",1) +c.loop_forever() diff --git a/TP4/ai_logic/mqtt_ai_subscriber.py b/TP4/ai_logic/mqtt_ai_subscriber.py new file mode 100644 index 0000000..5301f96 --- /dev/null +++ b/TP4/ai_logic/mqtt_ai_subscriber.py @@ -0,0 +1,136 @@ +# =========================================== +# File: TP4/ai_logic/mqtt_ai_subscriber.py +# Python MQTT subscriber that: +# - يستقبل JSON من "esp32/data" +# - يقرر prediction بسيط ويُنشر JSON إلى "esp32/control" مع qos=1 +# - جاهز للاستبدال بالـ ML pipeline لاحقاً +# =========================================== +import json +import time +import argparse +import logging +import os +import pickle + +import paho.mqtt.client as mqtt + +logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s: %(message)s") +LOGGER = logging.getLogger("mqtt_ai_subscriber") + +BROKER = os.getenv("MQTT_BROKER", "broker.mqtt.cool") +PORT = int(os.getenv("MQTT_PORT", 1883)) +TOPIC_IN = "esp32/data" +TOPIC_OUT = "esp32/control" +MODELS_DIR = "models" + +def load_model_if_exists(name): + """تحميل النموذج المحفوظ إذا وُجد، وإرجاع None خلاف ذلك""" + path = os.path.join(MODELS_DIR, f"{name}_pipeline.pkl") + if os.path.isfile(path): + LOGGER.info("Loading model from: %s", path) + with open(path, "rb") as f: + m = pickle.load(f) + return m + LOGGER.info("Model file not found: %s", path) + return None + +def json_to_features(msg_json): + """حوّل JSON الوارد إلى متجه ميزات حسب TP2. + عدّلي هذه الدالة لتتناسب مع ميزاتك الحقيقية. + """ + try: + temp = float(msg_json.get("temperature")) + hum = float(msg_json.get("humidity")) + except Exception as e: + LOGGER.error("Missing or invalid feature fields: %s", e) + return None + return [[temp, hum]] + +def build_control_msg(device_id, model_name, pred, prob): + return { + "device_id": device_id, + "model": model_name, + "prediction": int(pred), + "probability": float(prob), + "timestamp": int(time.time()) + } + +class MQTTInferenceClient: + def __init__(self, model_name="lr"): + self.model_name = model_name + self.model = load_model_if_exists(model_name) + self.client = mqtt.Client() + self.client.on_connect = self.on_connect + self.client.on_message = self.on_message + # set reconnection delays + self.client.reconnect_delay_set(min_delay=1, max_delay=120) + + def on_connect(self, client, userdata, flags, rc): + LOGGER.info("Connected to broker %s:%s rc=%s", BROKER, PORT, rc) + client.subscribe(TOPIC_IN, qos=1) + + def on_message(self, client, userdata, msg): + try: + payload = msg.payload.decode() + j = json.loads(payload) + except Exception as e: + LOGGER.error("Invalid JSON received: %s", e) + return + LOGGER.info("Received on %s: %s", msg.topic, j) + + device_id = j.get("device_id") + if not device_id: + LOGGER.warning("Message missing device_id; ignoring") + return + + features = json_to_features(j) + if features is None: + LOGGER.warning("Feature extraction failed; ignoring") + return + + # If a real model exists, use it; otherwise use simple rule + if self.model is not None: + try: + import numpy as np + feats = np.array(features) + if hasattr(self.model, "predict_proba"): + prob = float(self.model.predict_proba(feats)[0][1]) + pred = int(self.model.predict(feats)[0]) + else: + pred = int(self.model.predict(feats)[0]) + prob = 1.0 + except Exception as e: + LOGGER.exception("Model inference failed, falling back to rule: %s", e) + # fallback rule + pred = 1 if features[0][0] > 30 else 0 + prob = 0.9 if pred == 1 else 0.1 + else: + # simple rule: temperature-based + pred = 1 if features[0][0] > 30 else 0 + prob = 0.95 if pred == 1 else 0.05 + + ctrl = build_control_msg(device_id, self.model_name, pred, prob) + payload_out = json.dumps(ctrl) + result = client.publish(TOPIC_OUT, payload_out, qos=1) + if result.rc != mqtt.MQTT_ERR_SUCCESS: + LOGGER.error("Publish failed with rc=%s", result.rc) + else: + LOGGER.info("Published control: %s", payload_out) + + def start(self): + while True: + try: + LOGGER.info("Connecting to broker %s:%s", BROKER, PORT) + self.client.connect(BROKER, PORT, keepalive=60) + self.client.loop_forever() + except Exception as e: + LOGGER.exception("MQTT connection error, retrying in 5s: %s", e) + time.sleep(5) + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="MQTT AI subscriber") + parser.add_argument("--model", choices=["lr", "xgb"], default="lr", help="Which model to use") + args = parser.parse_args() + + cli = MQTTInferenceClient(model_name=args.model) + cli.start() diff --git a/TP4/models/lr_pipeline.pkl b/TP4/models/lr_pipeline.pkl new file mode 100644 index 0000000..6489460 Binary files /dev/null and b/TP4/models/lr_pipeline.pkl differ diff --git a/TP4/platformio.ini b/TP4/platformio.ini new file mode 100644 index 0000000..16f42ca --- /dev/null +++ b/TP4/platformio.ini @@ -0,0 +1,12 @@ +[env:esp32dev] +platform = espressif32 +board = esp32dev +framework = arduino +monitor_speed = 115200 + +lib_deps = + knolleary/PubSubClient@^2.8 + adafruit/DHT sensor library@^1.4.4 + adafruit/Adafruit Unified Sensor@^1.1.14 + marcoschwartz/LiquidCrystal_I2C@^1.1.4 + bblanchon/ArduinoJson@^6.21.3 diff --git a/TP4/report.md b/TP4/report.md new file mode 100644 index 0000000..a1ad169 --- /dev/null +++ b/TP4/report.md @@ -0,0 +1,33 @@ +# TP4 — Report + +## 1) Setup +- Device ID: esp-01 +- Broker: broker.mqtt.cool:1883 +- Topics: esp32/data, esp32/control, esp32/status/esp-01 +- QoS: publish=0 (ESP32) / subscribe=1 (control) +- Build time: 2025-11-12 00:40:20 + +## 2) Models +- lr_pipeline.pkl size: 1137 bytes +- Features used from ESP32: temperature, humidity (+ features[] placeholder) + +## 3) MQTT Flow +- ESP32 → esp32/data (JSON: device_id, temperature, humidity, seq, t_ms, features[], timestamp) +- Python subscriber → esp32/control (JSON: device_id, model, prediction, probability, timestamp) +- LED toggled based on prediction (1=ON, 0=OFF) + +## 4) Latency (to fill) +- Method: compare consecutive seq + timestamps in logs +- Sample E2E (ESP publish → control received): ____ ms +- Notes: public broker variability + +## 5) Robustness +- Reconnect/backoff, LWT online/offline, JSON validation + +## 6) Optional: HTTP/REST +- POST /infer {device_id, temperature, humidity, ...} → {prediction, probability} +- ESP32: WiFiClient/HTTPClient + +## 7) Conclusions +- Feasibility on ESP32, trade-offs MQTT vs REST +\n\n## Latency Results\n- Samples: 10\n- Avg latency: 1762905387579.1 ms\n- P95 latency: 1762905422916.0 ms\n \ No newline at end of file diff --git a/TP4/src/main.cpp b/TP4/src/main.cpp new file mode 100644 index 0000000..d646933 --- /dev/null +++ b/TP4/src/main.cpp @@ -0,0 +1,235 @@ +/* =========================================== + File: TP4/src/main.cpp + ESP32 firmware (MQTT publisher + subscriber) + - ينشر JSON على "esp32/data" + - يشترك على "esp32/control" ويتعامل مع JSON تحكم + - يعلن الحالة على "esp32/status/" (online/offline) + =========================================== */ + +#include +#include +#include +#include +#include + +// ---------- Pins & Peripherals ---------- +#define DHTPIN 15 +#define DHTTYPE DHT22 +#define LED_PIN 2 +#define LCD_ADDR 0x27 // غيّريها إلى 0x3F إذا الشاشة سوداء + +// ---------- WiFi ---------- +const char* ssid = "Wokwi-GUEST"; +const char* password = ""; + +// ---------- MQTT ---------- +const char* mqtt_server = "broker.mqtt.cool"; +const int mqtt_port = 1883; + +#define MQTT_TOPIC_OUT "esp32/data" +#define MQTT_TOPIC_IN "esp32/control" + +WiFiClient espClient; +PubSubClient client(espClient); +DHT dht(DHTPIN, DHTTYPE); +LiquidCrystal_I2C lcd(LCD_ADDR, 16, 2); + +String currentCommand = "---"; +const char* device_id = "esp-01"; + +// عدّاد تسلسلي للرسائل +static uint32_t SEQ = 0; + +// payload التجريبي ذو 12 ميزة (نحدّث أول ميزتين من DHT) +const int N_FEATURES = 12; +float X[N_FEATURES] = {20.0, 57.36, 0, 400, 12306, 18520, 939.735, 0.0, 0.0, 0.0, 0.0, 0.0}; + +// ---------- Helpers ---------- +String statusTopic() { + return String("esp32/status/") + device_id; +} + +void publishStatus(const char* state) { + // retained=true حتى يعرف المشتركون الحالة فور الاشتراك + String msg = String("{\"device_id\":\"") + device_id + "\",\"status\":\"" + state + "\"}"; + client.publish(statusTopic().c_str(), msg.c_str(), true); +} + +void setup_wifi() { + WiFi.mode(WIFI_STA); + Serial.print("Connecting to WiFi"); + WiFi.begin(ssid, password); + while (WiFi.status() != WL_CONNECTED) { + delay(500); + Serial.print("."); + } + Serial.println("\nWiFi connected! IP: " + WiFi.localIP().toString()); +} + +void handleControlJson(const String& msg) { + StaticJsonDocument<256> doc; + DeserializationError err = deserializeJson(doc, msg); + if (err) { + // دعم نص بسيط ON/OFF + if (msg.equalsIgnoreCase("ON")) { + digitalWrite(LED_PIN, HIGH); + currentCommand = "ON"; + } else if (msg.equalsIgnoreCase("OFF")) { + digitalWrite(LED_PIN, LOW); + currentCommand = "OFF"; + } + lcd.setCursor(0, 1); + lcd.print("CMD:"); + lcd.print(currentCommand); + lcd.print(" "); + return; + } + + const char* incoming_id = doc["device_id"] | ""; + if (String(incoming_id) != String(device_id)) { + Serial.println("Control not for this device (device_id mismatch). Ignoring."); + return; + } + + const char* model = doc["model"] | ""; + int prediction = doc["prediction"] | -1; + float probability = doc["probability"] | 0.0; + + Serial.printf("model=%s prediction=%d prob=%.2f\n", model, prediction, probability); + + if (prediction == 1) { + digitalWrite(LED_PIN, HIGH); + currentCommand = "ON"; + } else if (prediction == 0) { + digitalWrite(LED_PIN, LOW); + currentCommand = "OFF"; + } else { + currentCommand = "NONE"; + } + + lcd.setCursor(0, 1); + lcd.print("CMD:"); + lcd.print(currentCommand); + lcd.print(" "); +} + +void callback(char* topic, byte* message, unsigned int length) { + String msg; + msg.reserve(length); + for (unsigned int i = 0; i < length; i++) msg += (char)message[i]; + msg.trim(); + + Serial.print("Received on "); + Serial.print(topic); + Serial.print(": "); + Serial.println(msg); + + handleControlJson(msg); +} + +void reconnect() { + while (!client.connected()) { + Serial.print("Attempting MQTT connection... "); + String clientId = "ESP32Client-"; + clientId += String((uint32_t)esp_random(), HEX); + + // إعداد LWT (offline) + String willMsg = String("{\"device_id\":\"") + device_id + "\",\"status\":\"offline\"}"; + String willTop = statusTopic(); + + // connect(clientId, willTopic, willQos, willRetain, willMessage) + if (client.connect(clientId.c_str(), willTop.c_str(), 1, true, willMsg.c_str())) { + Serial.println("connected"); + client.subscribe(MQTT_TOPIC_IN, 1); // QoS=1 لاستقبال التحكم + publishStatus("online"); // أعلن أننا Online + } else { + Serial.print("failed, rc="); + Serial.print(client.state()); + Serial.println(" retrying in 5s"); + delay(5000); + } + } +} + +void publishSensor(float t, float h, unsigned long now_ms) { + // تحديث الميزات + X[0] = t; + X[1] = h; + + // تحديث شاشة LCD + lcd.setCursor(0, 0); + lcd.print("T:"); + lcd.print(t, 1); + lcd.print("C H:"); + lcd.print(h, 0); + lcd.print("% "); + + lcd.setCursor(0, 1); + lcd.print("CMD:"); + lcd.print(currentCommand); + lcd.print(" "); + + // بناء JSON + DynamicJsonDocument outDoc(512); + outDoc["device_id"] = device_id; + outDoc["temperature"] = t; + outDoc["humidity"] = h; + outDoc["seq"] = (uint32_t)SEQ++; // عداد الرسائل + outDoc["t_ms"] = (uint32_t)now_ms; // وقت الإرسال بالمللي على ESP32 + JsonArray arr = outDoc.createNestedArray("features"); + for (int i = 0; i < N_FEATURES; i++) arr.add(X[i]); + outDoc["timestamp"] = (unsigned long)(now_ms / 1000); // seconds since boot + + // انشر (نستخدم String لتفادي مشاكل حجم البفر، ورفعنا بافر MQTT إلى 512) + String payload; + serializeJson(outDoc, payload); + bool ok = client.publish(MQTT_TOPIC_OUT, payload.c_str()); + if (ok) { + Serial.print("Published: "); + Serial.println(payload); + } else { + Serial.println("Publish failed"); + } +} + +// ---------- Arduino entry points ---------- +void setup() { + Serial.begin(115200); + pinMode(LED_PIN, OUTPUT); + + lcd.init(); + lcd.backlight(); + lcd.clear(); + lcd.print("Starting..."); + dht.begin(); + + setup_wifi(); + + client.setServer(mqtt_server, mqtt_port); + client.setCallback(callback); + client.setBufferSize(512); // مهم: حتى لا يُقص JSON + client.setKeepAlive(30); + client.setSocketTimeout(5); +} + +unsigned long lastMsg = 0; +const unsigned long interval = 3000; // كل 3 ثوانٍ + +void loop() { + if (!client.connected()) reconnect(); + client.loop(); + + unsigned long now = millis(); + if (now - lastMsg > interval) { + lastMsg = now; + + float h = dht.readHumidity(); + float t = dht.readTemperature(); + if (isnan(h) || isnan(t)) { + Serial.println("Failed reading from DHT sensor!"); + return; + } + + publishSensor(t, h, now); + } +} diff --git a/TP4/wokwi.toml b/TP4/wokwi.toml new file mode 100644 index 0000000..2a728e5 --- /dev/null +++ b/TP4/wokwi.toml @@ -0,0 +1,9 @@ +[wokwi] +version = 1 +firmware = ".pio/build/esp32dev/firmware.bin" +elf = ".pio/build/esp32dev/firmware.elf" + +[connections.phantomio] +# Enable PhantomIO for serial and telemetry +enabled = true +port = "serial" diff --git a/TP_2_AIoT.ipynb b/TP_2_AIoT.ipynb new file mode 100644 index 0000000..45e4f44 --- /dev/null +++ b/TP_2_AIoT.ipynb @@ -0,0 +1,427 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "v45v2zeZdZDd" + }, + "outputs": [], + "source": [ + "import os, pickle, warnings, sys\n", + "from time import perf_counter\n", + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.metrics import accuracy_score, f1_score\n", + "\n", + "warnings.filterwarnings(\"ignore\")" + ] + }, + { + "cell_type": "code", + "source": [ + "SEED = 42\n", + "USE_DEMO_DATA = True # ← لو لديك X_train... غيّرها إلى False واستخدم قسم \"YOUR DATA\" بالأسفل\n", + "SAVE_DIR = \"./outputs\"\n", + "os.makedirs(SAVE_DIR, exist_ok=True)\n" + ], + "metadata": { + "id": "VlrayDxyd16T" + }, + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def load_demo_data(test_size=0.2, random_state=SEED):\n", + " from sklearn.datasets import load_breast_cancer\n", + " from sklearn.model_selection import train_test_split\n", + " X, y = load_breast_cancer(return_X_y=True, as_frame=True)\n", + " X_train, X_test, y_train, y_test = train_test_split(\n", + " X, y, test_size=test_size, random_state=random_state, stratify=y\n", + " )\n", + " return X_train, X_test, y_train, y_test\n" + ], + "metadata": { + "id": "1M-22Pr2d4ch" + }, + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def load_your_data():\n", + " # مثال (عدّل حسب بياناتك):\n", + " # return X_train, X_test, y_train, y_test\n", + " return None\n", + "\n", + "if USE_DEMO_DATA:\n", + " X_train, X_test, y_train, y_test = load_demo_data()\n", + "else:\n", + " maybe = load_your_data()\n", + " if maybe is None:\n", + " raise ValueError(\"رجاءً زوّد X_train, X_test, y_train, y_test في load_your_data().\")\n" + ], + "metadata": { + "id": "sZvT6v8Rd6le" + }, + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def build_lr_pipeline():\n", + " return Pipeline([\n", + " (\"scaler\", StandardScaler()),\n", + " (\"clf\", LogisticRegression(max_iter=1000, solver=\"lbfgs\", random_state=SEED))\n", + " ])\n", + "\n", + "def build_xgb_pipeline():\n", + " try:\n", + " from xgboost import XGBClassifier\n", + " except Exception as e:\n", + " print(\"⚠️ XGBoost غير مثبت. ثبّت الحزمة ثم أعد التشغيل: pip install xgboost\")\n", + " return None\n", + " return Pipeline([\n", + " (\"scaler\", StandardScaler()), # ثابت للتوحيد مع الـ MLOps\n", + " (\"clf\", XGBClassifier(\n", + " n_estimators=100,\n", + " max_depth=4,\n", + " learning_rate=0.1,\n", + " subsample=0.8,\n", + " colsample_bytree=0.8,\n", + " reg_lambda=1.0,\n", + " tree_method=\"hist\", # أسرع/أخف عادة\n", + " n_jobs=1,\n", + " random_state=SEED,\n", + " eval_metric=\"logloss\",\n", + " ))\n", + " ])\n" + ], + "metadata": { + "id": "67420LsRd9jQ" + }, + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def train_and_eval(pipe, name, X_train, y_train, X_test, y_test):\n", + " t0 = perf_counter()\n", + " pipe.fit(X_train, y_train)\n", + " train_time_s = perf_counter() - t0\n", + "\n", + " y_pred = pipe.predict(X_test)\n", + " acc = accuracy_score(y_test, y_pred)\n", + " f1 = f1_score(y_test, y_pred, average=\"weighted\")\n", + "\n", + " return {\n", + " \"Model\": name,\n", + " \"Accuracy\": acc,\n", + " \"F1\": f1,\n", + " \"Train_Time_s\": train_time_s\n", + " }" + ], + "metadata": { + "id": "5iuNDdHbeAwF" + }, + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def measure_model_size(pipe, out_path):\n", + " b = pickle.dumps(pipe, protocol=pickle.HIGHEST_PROTOCOL)\n", + " with open(out_path, \"wb\") as f:\n", + " f.write(b)\n", + " size_kb = len(b) / 1024.0\n", + " return size_kb\n" + ], + "metadata": { + "id": "pNVsNYbReFAf" + }, + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def measure_inference_time(pipe, X_test):\n", + " t0 = perf_counter()\n", + " _ = pipe.predict(X_test)\n", + " total_s = perf_counter() - t0\n", + " single_ms = (total_s / len(X_test)) * 1000.0\n", + " return total_s, single_ms\n" + ], + "metadata": { + "id": "htN0rujzeG9y" + }, + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "results_metrics = []\n", + "results_sizes = []\n", + "results_timing = []\n" + ], + "metadata": { + "id": "DmBo7_sQeIyG" + }, + "execution_count": 9, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "lr_pipe = build_lr_pipeline()\n", + "lr_metrics = train_and_eval(lr_pipe, \"LR-Pipeline\", X_train, y_train, X_test, y_test)\n", + "lr_size_kb = measure_model_size(lr_pipe, os.path.join(SAVE_DIR, \"lr_pipeline.pkl\"))\n", + "lr_inf_total_s, lr_inf_single_ms = measure_inference_time(lr_pipe, X_test)\n", + "\n", + "results_metrics.append(lr_metrics)\n", + "results_sizes.append({\"Model\": \"LR-Pipeline\", \"Size_KB\": lr_size_kb})\n", + "results_timing.append({\n", + " \"Model\": \"LR-Pipeline\",\n", + " \"Total_Test_Inference_Time_s\": lr_inf_total_s,\n", + " \"Single_Inference_Time_ms\": lr_inf_single_ms\n", + "})" + ], + "metadata": { + "id": "nUmbLXH2eKvV" + }, + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "xgb_pipe = build_xgb_pipeline()\n", + "if xgb_pipe is not None:\n", + " xgb_metrics = train_and_eval(xgb_pipe, \"XGB-Pipeline\", X_train, y_train, X_test, y_test)\n", + " xgb_size_kb = measure_model_size(xgb_pipe, os.path.join(SAVE_DIR, \"xgb_pipeline.pkl\"))\n", + " xgb_inf_total_s, xgb_inf_single_ms = measure_inference_time(xgb_pipe, X_test)\n", + "\n", + " results_metrics.append(xgb_metrics)\n", + " results_sizes.append({\"Model\": \"XGB-Pipeline\", \"Size_KB\": xgb_size_kb})\n", + " results_timing.append({\n", + " \"Model\": \"XGB-Pipeline\",\n", + " \"Total_Test_Inference_Time_s\": xgb_inf_total_s,\n", + " \"Single_Inference_Time_ms\": xgb_inf_single_ms\n", + " })\n" + ], + "metadata": { + "id": "komGY7XheOTR" + }, + "execution_count": 11, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "metrics_df = pd.DataFrame(results_metrics).set_index(\"Model\")\n", + "sizes_df = pd.DataFrame(results_sizes).set_index(\"Model\")\n", + "timing_df = pd.DataFrame(results_timing).set_index(\"Model\")\n", + "\n", + "metrics_path = os.path.join(SAVE_DIR, \"metrics.csv\")\n", + "sizes_path = os.path.join(SAVE_DIR, \"model_sizes.csv\")\n", + "timing_path = os.path.join(SAVE_DIR, \"inference_times.csv\")\n", + "\n", + "metrics_df.to_csv(metrics_path)\n", + "sizes_df.to_csv(sizes_path)\n", + "timing_df.to_csv(timing_path)\n", + "\n", + "print(\"\\n===== Task 1: Scores =====\")\n", + "print(metrics_df.round(4))\n", + "print(\"\\n===== Task 2.1: Model Sizes (KB) =====\")\n", + "print(sizes_df.round(2))\n", + "print(\"\\n===== Task 2.2: Inference Times =====\")\n", + "print(timing_df.round(6))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jCafju_jeQ1X", + "outputId": "235db039-03f5-401c-c6d9-64c0b408d4bf" + }, + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "===== Task 1: Scores =====\n", + " Accuracy F1 Train_Time_s\n", + "Model \n", + "LR-Pipeline 0.9825 0.9825 0.0730\n", + "XGB-Pipeline 0.9561 0.9558 0.1675\n", + "\n", + "===== Task 2.1: Model Sizes (KB) =====\n", + " Size_KB\n", + "Model \n", + "LR-Pipeline 2.61\n", + "XGB-Pipeline 104.52\n", + "\n", + "===== Task 2.2: Inference Times =====\n", + " Total_Test_Inference_Time_s Single_Inference_Time_ms\n", + "Model \n", + "LR-Pipeline 0.012247 0.107430\n", + "XGB-Pipeline 0.009566 0.083915\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Task 2.3: ESP32 Analysis (auto paragraph)" + ], + "metadata": { + "id": "gmYQlVL_eaem" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "62a33869" + }, + "source": [ + "def esp32_analysis_text(sizes_df, timing_df):\n", + " # مواصفات تقريبية: RAM ~ 520 KB، Flash بعدة ميغابايت (حسب اللوحة)\n", + " SRAM_KB = 520.0\n", + " ONE_SEC_BUDGET_MS = 1000.0 # مثال: استدلال كل ثانية\n", + "\n", + " def row_or_none(df, name):\n", + " return df.loc[name] if name in df.index else None\n", + "\n", + " lr_s = row_or_none(sizes_df, \"LR-Pipeline\")\n", + " xg_s = row_or_none(sizes_df, \"XGB-Pipeline\")\n", + " lr_t = row_or_none(timing_df, \"LR-Pipeline\")\n", + " xg_t = row_or_none(timing_df, \"XGB-Pipeline\")\n", + "\n", + " lines = []\n", + " lines.append(\"تحليل الجاهزية للنشر على ESP32:\\n\")\n", + "\n", + " # الذاكرة\n", + " lines.append(\"1) الملاءمة الذاكرية:\")\n", + " if lr_s is not None:\n", + " lines.append(f\" • LR: حجم النموذج ≈ {lr_s['Size_KB']:.1f} KB (يُخزَّن عادةً في الـFlash ويُحمَّل جزئيًا/كاملًا إلى الـSRAM وقت الاستدلال).\")\n", + " if xg_s is not None:\n", + " lines.append(f\" • XGB: حجم النموذج ≈ {xg_s['Size_KB']:.1f} KB.\")\n", + " lines.append(f\" - قياس مرجعي: ESP32 يملك نحو {SRAM_KB:.0f} KB SRAM إجمالي (والحيّز المتاح للتطبيق أقل من ذلك).\")\n", + " if (lr_s is not None) and (xg_s is not None):\n", + " more_constrained = \"XGB-Pipeline\" if xg_s['Size_KB'] > lr_s['Size_KB'] else \"LR-Pipeline\"\n", + " lines.append(f\" ⇒ الأكثر تقييدًا من ناحية الحجم هو: {more_constrained}.\")\n", + " lines.append(\"\")\n", + "\n", + " # الزمن\n", + " lines.append(\"2) الكفاءة الزمنية (استدلال آنٍ كل 1 ثانية كمثال):\")\n", + " if lr_t is not None:\n", + " lines.append(f\" • LR: زمن الاستدلال المفرد ≈ {lr_t['Single_Inference_Time_ms']:.3f} ms.\")\n", + " if xg_t is not None:\n", + " lines.append(f\" • XGB: زمن الاستدلال المفرد ≈ {xg_t['Single_Inference_Time_ms']:.3f} ms.\")\n", + " lines.append(f\" - بميزانية ~{ONE_SEC_BUDGET_MS:.0f} ms/ث، كلاهما عادةً مقبول إذا كان الزمن بالملّي ثوانٍ أحادية أو عشرات قليلة.\")\n", + " lines.append(\"\")\n", + "\n", + " # الخلاصة + تحسينات\n", + " choice = None\n", + " if (lr_s is not None) and (xg_s is not None) and (lr_t is not None) and (xg_t is not None):\n", + " # قرار بسيط: اختر الأصغر والأسرع\n", + " score_lr = lr_s['Size_KB'] * 0.5 + lr_t['Single_Inference_Time_ms'] * 0.5\n", + " score_xgb = xg_s['Size_KB'] * 0.5 + xg_t['Single_Inference_Time_ms'] * 0.5\n", + " choice = \"LR-Pipeline\" if score_lr <= score_xgb else \"XGB-Pipeline\"\n", + " elif lr_s is not None and lr_t is not None:\n", + " choice = \"LR-Pipeline\"\n", + "\n", + " lines.append(\"3) الخلاصة:\")\n", + " if choice is not None:\n", + " lines.append(f\" • الخيار الأنسب على أساس الكفاءة وقيود ESP32: **{choice}**.\")\n", + " else:\n", + " lines.append(\" • تعذّر الحسم لغياب قياسات أحد النموذجين.\")\n", + "\n", + " lines.append(\" • لو احتجت لنشر النموذج الأقل كفاءة: استخدم تقنيات التحسين مثل: تقليل الميزات، التكميم 8-bit أو fixed-point، تقليل عدد الأشجار/العمق (لـXGB)، التحويل إلى C/MCU عبر m2cgen أو Treelite، واستخدام استدلال أحادي الخيط، مع إدارة ذاكرة صارمة.\")\n", + " return \"\\n\".join(lines)" + ], + "execution_count": 31, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(\"\\n===== Task 2.3: ESP32 Analysis =====\")\n", + "print(esp32_analysis_text(sizes_df, timing_df))\n", + "\n", + "print(f\"\\n📁 Saved to: {os.path.abspath(SAVE_DIR)}\")\n", + "print(\" - metrics.csv (الدقة و F1 وزمن التدريب)\")\n", + "print(\" - model_sizes.csv (أحجام النماذج بالـKB)\")\n", + "print(\" - inference_times.csv (زمن الاستدلال الكلّي والمفرد)\")\n", + "print(\" - lr_pipeline.pkl / xgb_pipeline.pkl\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "veQJ5tnxkCSS", + "outputId": "06467f04-7047-4a5d-f491-b3d85c2c8f85" + }, + "execution_count": 32, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "===== Task 2.3: ESP32 Analysis =====\n", + "تحليل الجاهزية للنشر على ESP32:\n", + "\n", + "1) الملاءمة الذاكرية:\n", + " • LR: حجم النموذج ≈ 2.6 KB (يُخزَّن عادةً في الـFlash ويُحمَّل جزئيًا/كاملًا إلى الـSRAM وقت الاستدلال).\n", + " • XGB: حجم النموذج ≈ 104.5 KB.\n", + " - قياس مرجعي: ESP32 يملك نحو 520 KB SRAM إجمالي (والحيّز المتاح للتطبيق أقل من ذلك).\n", + " ⇒ الأكثر تقييدًا من ناحية الحجم هو: XGB-Pipeline.\n", + "\n", + "2) الكفاءة الزمنية (استدلال آنٍ كل 1 ثانية كمثال):\n", + " • LR: زمن الاستدلال المفرد ≈ 0.107 ms.\n", + " • XGB: زمن الاستدلال المفرد ≈ 0.084 ms.\n", + " - بميزانية ~1000 ms/ث، كلاهما عادةً مقبول إذا كان الزمن بالملّي ثوانٍ أحادية أو عشرات قليلة.\n", + "\n", + "3) الخلاصة:\n", + " • الخيار الأنسب على أساس الكفاءة وقيود ESP32: **LR-Pipeline**.\n", + " • لو احتجت لنشر النموذج الأقل كفاءة: استخدم تقنيات التحسين مثل: تقليل الميزات، التكميم 8-bit أو fixed-point، تقليل عدد الأشجار/العمق (لـXGB)، التحويل إلى C/MCU عبر m2cgen أو Treelite، واستخدام استدلال أحادي الخيط، مع إدارة ذاكرة صارمة.\n", + "\n", + "📁 Saved to: /content/outputs\n", + " - metrics.csv (الدقة و F1 وزمن التدريب)\n", + " - model_sizes.csv (أحجام النماذج بالـKB)\n", + " - inference_times.csv (زمن الاستدلال الكلّي والمفرد)\n", + " - lr_pipeline.pkl / xgb_pipeline.pkl\n" + ] + } + ] + } + ] +} \ No newline at end of file