This project aims to showcase the implementation of the logistic regression algorithm for the classification of the famous Iris dataset. The Iris dataset is a multivariate dataset introduced by the British biologist and statistician Ronald A. Fisher in 1936. It consists of 150 samples of iris flowers, with 50 samples for each of three different species: setosa, versicolor, and virginica. Each sample contains four features: sepal length, sepal width, petal length, and petal width.
The main objective of this project is to demonstrate how logistic regression can be used for multi-class classification problems and how to evaluate the model's performance on the Iris dataset.
To run this project, you will need the following:
- Python (3.x or later)
- Jupyter Notebook (optional but recommended)
- Clone this repository to your local machine:
git clone https://github.com/himanshumahajan138/LGMVIP-DataScience-1.git
- Change your directory to the project folder:
cd LGMVIP-DataScience-1/Himanshu_iris
- (Optional) Create a virtual environment and activate it (recommended):
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the required Python packages:
pip install -r requirements.txt
- Open the Jupyter Notebook:
jupyter notebook
-
Navigate to the
Himanshu_iris.ipynb
file and open it. -
Run the cells in the notebook to see the step-by-step implementation of the logistic regression algorithm on the Iris dataset.
The project repository includes the following files:
Himanshu_iris.ipynb
: Jupyter Notebook containing the implementation of the logistic regression algorithm.iris.data
: The dataset file in data format.requirements.txt
: List of Python packages required to run the project.
-
Data Loading: Loading the Iris dataset from the
iris.Data
file and performing initial data exploration. -
Data Preprocessing: Handling missing values (if any), encoding categorical variables (if any), and splitting the dataset into training and testing sets.
-
Model Implementation: Building and training the logistic regression model using the training data.
-
Model Evaluation: Evaluating the model's performance on the test dataset using accuracy and confusion matrix.
This project demonstrates the implementation of the logistic regression algorithm for the classification of the Iris dataset. By following the steps in the Jupyter Notebook, you will gain a better understanding of logistic regression and how it can be applied to multi-class classification problems like the Iris dataset.
Feel free to experiment with different machine learning algorithms and techniques to further enhance the classification accuracy and explore more insights from the dataset.
For any questions or suggestions, please feel free to reach out to the project's contributors. Happy coding!