Skip to content

Ajay9704/spam_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“§ Spam Email Classification using Machine Learning

Hi, I’m Ajay πŸ‘‹
This repository contains my Spam Email Classification project, where I built and compared multiple ML models to detect whether an email is Spam or Not Spam (Ham).

This project was inspired by the CampusX Spam Classification project, but I extended it with my own feature engineering, model comparison, and performance benchmarking.


πŸ” Project Overview

Email spam is a huge problem, with billions of unwanted emails sent daily. To tackle this, I:

  • Collected and preprocessed a labeled email dataset.
  • Cleaned the text (removing punctuation, stopwords, special characters).
  • Used TF-IDF vectorization to represent text numerically.
  • Trained multiple models including Naive Bayes, Random Forest, Extra Trees, and XGBoost.
  • Evaluated them on precision, recall, F1-score, and accuracy.

πŸ› οΈ Tech Stack

  • Python
  • Pandas, NumPy – data preprocessing
  • Scikit-learn – ML models, TF-IDF vectorization, metrics
  • XGBoost – advanced boosting model
  • Matplotlib, Seaborn – visualization

πŸ“Š Model Performance

Model Accuracy Precision Recall F1-Score
Naive Bayes 94.10% 100% 92% 95.8%
Random Forest 96.8% 97% 96% 96.5%
Extra Trees 97.2% 97% 97% 97%
XGBoost (Final) 98.3% 98% 98% 98%

✨ The XGBoost model was chosen as the final model because it gave the best balance of precision and recall, ensuring fewer false positives and negatives.


🌟 Key Insights

  • Naive Bayes is simple and surprisingly effective for text classification.
  • Ensemble models (Random Forest, Extra Trees, XGBoost) consistently outperform simpler models.
  • Precision matters a lot β€” flagging a real email as spam is more harmful than missing a spam mail.
  • Feature engineering (like removing rare words and normalizing text) significantly improves results.

πŸš€ Future Improvements

  • Build a Flask/Streamlit app for real-time email classification.
  • Deploy the model as an API.
  • Add support for multi-language spam detection.
  • Train on larger, real-world datasets (Enron, SpamAssassin).

πŸ“¬ Connect with Me

LinkedIn: linkedin.com/in/budida-ajay

GitHub: github.com/ajay9704


πŸ“Œ How to Run

# Clone the repository
git clone https://github.com/ajay9704/spam-classification.git

# Navigate into the project
cd spam-classification

# Install dependencies
pip install -r requirements.txt




# Run the model training
python main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages