End to End MLOPS Project with ETL Pipeline - Building Network Security Project using Phishing Dataset

This project demonstrates an End-to-End MLOps pipeline using a phishing dataset.The pipeline encompasses data ingestion, validation, transformation, model training, evaluation, and deployment, The goal of the project is to build a robust machine learning model to detect phishing websites using a complete MLOps setup, ensuring that it can be maintained, versioned, and deployed in production with ease.

The pipeline integrates tools and frameworks like MLflow, DagsHub, MongoDB, FastAPI, Docker, and GitHub Actions to automate processes like model training, deployment, and continuous integration/continuous deployment (CI/CD). automating everything from data ingestion to model deployment. This project focused on the MLOps lifecycle, leveraging powerful tools of machine learning workflows

Project Overview

Folder Structure

constants/: Holds configuration files used throughout the pipeline.
entity/: Contains definitions for core entities like models and data schemas.
logging/: Custom logging setup for tracking pipeline and model training activities.
exception/: Custom exceptions for handling errors in the pipeline.
pipeline/: Orchestrates the flow of the ML pipeline, from ETL to model training and inference.
utils/: General utility functions used across the project.

Project Architecture

Data Ingestion → 2. Validation → 3. Transformation → 4. Model Training → 5. Deployment with FastAPI → 6. Monitoring with MLflow

Data Ingestion: Load raw phishing data from CSV files and store it in MongoDB for easy access and scalability.
Data Validation: Ensure the data's integrity by checking for missing values, duplicates, or outliers.
Data Transformation: Clean the data and apply feature engineering techniques (e.g., scaling, encoding, or feature extraction).
Model Training: Train machine learning models using different algorithms and tune hyperparameters.
Model Evaluation: Evaluate the trained models based on performance metrics like accuracy, precision, recall, and F1 score.
Deployment: Expose the trained model through a FastAPI application to make predictions on new phishing data.
MLOps Automation: Track experiments and models using MLflow and manage the entire pipeline with DagsHub. Dockerize the application for portability.

Data Ingestion Architecture

Data Validation Architecture

Data Transformation Architecture

Model Trainer Architecture

Hyperparameter Tuning for Model Selection

During the model development process, I performed hyperparameter tuning to optimize the model's performance. Various hyperparameters were tested across different configurations to find the best performing model.

Models and Hyperparameters

Model Performance Comparison:

Here are the results for different models tested:

AdaBoost: 92.51%
Random Forest: 89.08%
Decision Tree: 86.35%
Gradient Boosting: 82.43%
Logistic Regression: 79.43%

Best Model and Hyperparameter

Running the Project

Flask output

After successfully setting up and running the project, the Flask API should be up and running. Below is an example of the output you would see when accessing the API.

Mlflow Ui

DagsHub

DagsHub is used for tracking experiments and managing model versioning.

Metrics over multiple runs

Scores

Prediction Report on Test Data

After training and fine-tuning the model, I evaluated its performance on a separate test dataset. Below is the Prediction Report in the form of an HTML table.

Conclusion

This project has been an exciting journey through the entire Machine Learning lifecycle. From data ingestion and cleaning to model training, evaluation, and deployment, I’ve built a robust end-to-end MLOps pipeline. The key goal of this project was to demonstrate how to structure and automate a complete machine learning workflow, making it ready for production.

By experimenting with multiple models and fine-tuning hyperparameters, I was able to identify the AdaBoost model as the most efficient for this particular task, achieving an accuracy of 92.51%. I then containerized the entire workflow using Docker, integrated continuous integration/continuous deployment (CI/CD) through GitHub Actions, and tracked model experiments using MLflow.

This pipeline not only automates the machine learning lifecycle but also makes it easier to deploy, monitor, and update the models as new data becomes available.

Interested in the full source code? Feel free to reach out!

Authors

@SureshKrishnan

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to End MLOPS Project with ETL Pipeline - Building Network Security Project using Phishing Dataset

Project Overview

Folder Structure

Project Architecture

Data Ingestion Architecture

Data Validation Architecture

Data Transformation Architecture

Model Trainer Architecture

Hyperparameter Tuning for Model Selection

Models and Hyperparameters

Model Performance Comparison:

Best Model and Hyperparameter

Running the Project

Flask output

Mlflow Ui

DagsHub

Metrics over multiple runs

Scores

Prediction Report on Test Data

Conclusion

Interested in the full source code? Feel free to reach out!

Authors

About

Releases

Packages

SKrishna-7/end-end-mlops-project-report

Folders and files

Latest commit

History

Repository files navigation

End to End MLOPS Project with ETL Pipeline - Building Network Security Project using Phishing Dataset

Project Overview

Folder Structure

Project Architecture

Data Ingestion Architecture

Data Validation Architecture

Data Transformation Architecture

Model Trainer Architecture

Hyperparameter Tuning for Model Selection

Models and Hyperparameters

Model Performance Comparison:

Best Model and Hyperparameter

Running the Project

Flask output

Mlflow Ui

DagsHub

Metrics over multiple runs

Scores

Prediction Report on Test Data

Conclusion

Interested in the full source code? Feel free to reach out!

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages