My first ML project. This very basic project is a spam email classifier that uses ML techniques to identify whether an email is spam or not. It utilizes a dataset of labeled emails to train and evaluate the model's performance.
This spam email classifier is designed to distinguish between spam and non-spam (ham) emails. It's built using machine learning algorithms and leverages the TF-IDF (Term Frequency-Inverse Document Frequency) vectorization technique to convert email text data into numerical features.
- Preprocessing of email text data, including tokenization and TF-IDF vectorization.
- Training of a machine learning model (e.g., Multinomial Naive Bayes) on a labeled dataset.
- Evaluation of model performance using metrics such as accuracy, precision, recall, and F1-score.
The dataset used for this project contains a collection of emails labeled as 'spam' and 'ham'. The dataset has been dedicated to the public domain, allowing free use, modification, distribution, and even commercial use without requiring permission. The dataset is included in this repository as spam.csv.
The dataset was obtained from Kaggle and is available under a public domain dedication.
- Python.
- Scikit-Learn: For ML and model evaluation.
- Pandas: For data loading and preprocessing.
- Git and GitHub: Version control and repository hosting.
- Jupyter Notebook (Optional): For experimentation and documentation.
-
Clone this repository to your local machine:
git clone https://github.com/Fayouzz/Spam-Email-Classsifer.git
-
Navigate to the project directory:
cd spamemailclassifier
-
Install the required Python packages:
pip install scikit-learn pandas
-
Run the spam email classifier:
python spamemailclassifier.py
Be sure to replace spamemailclassifier.py with the actual filename of your Python script.
This project is licensed under the MIT License. See the LICENSE file for details.
Any suggestions, feedback or questions about this project would be appreciated. Feel free to reach out to me at Fayouz.