- This is a Flask API developed by me to determine if an onboarder on the infamous Titanic would survive provided their details.
- This version is built upon the previous version which was quite basic in operation.
-
- New models and model selection options
- New features for more accurate prediction
- New form format
❗ Some models are not working currently. Specifically: Random Forest, KNN and Decision Trees. The solution is being worked on. |
---|
- Clone this repository. Open CMD. Ensure that you are in the project home directory. Create the machine learning model by running models.py as such:
python models.py
- This will create a serialized version of our models into files with an extension .pkl
or use the previously pretrained models saved in the ./models
folder.
python api.py
- The dataset used for training was taken from Kaggle.
- Link: Titanic Dataset: https://www.kaggle.com/c/titanic/data
Variable | Definition | Key |
---|---|---|
pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |
sex | Sex | |
Age | Age in years | |
sibsp | # of siblings / spouses aboard the Titanic | |
parch | # of parents / children aboard the Titanic | |
ticket | Ticket number | |
fare | Passenger fare | |
cabin | Cabin number | |
embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
-
- 0 : Not Survived,
- 1 : Survived
- The dataset shows that this is clearly a classification task and can be solved by a myriad of classification algorithms such as Logistic Regression, Decision Trees and even Random Forests.
- I chose 6 algorithms to train the dataset on because why not.
- The models which were selected were: Logistic Regression, K-Nearest Neighbours, Gaussian Naive Bayes, Decision trees, Random Forest and Support Vector Machines.
- Model Performances:
Model Score Random Forest 86.76 Decision Tree 86.76 KNN 84.74 Logistic Regression 80.36 Support Vector Machines 78.23 Naive Bayes 72.28
- I made an API for all the models so that users can interact and use the Machine Learning models with ease. User can select which model they would like to use during prediction.
- To make the API work I have used the Flask library which are mostly used for such tasks.
- I have also connected a HTML form to the flask app to take in user input and a CSS file to decorate it.
- The Flask API was deployed on the Heroku cloud platform so that anyone with the link to the app can access it online.
- I have connected this GitHub repository to Heroku so that it can be run on the Heroku dyno.
- I have used the Gunicorn package which lets Python applications run on any web server. The
Procfile
andrequirements.txt
should be defined with all the details required before the deployment.
- Data Wrangling using Pandas
- Feature Engineering to fit our data to our model
- Saving the model and using it again with Pickle
- Making a flask app
- A little frontend web development
- Making the app live by deploying it on cloud platforms