A team of (aspiring?) Data Scientists having adventures at Kaggle. Here we will describe our approach to the Titanic problem.
Based on the sinking of the RMS Titanic, that ended up killing 1502 out of 2224 passengers and crew. One of the reasons for such loss was that there were not enough lifeboats for the passengers and crew. Some groups of people were more likely to survive than others. In this challenge you are requested to analyse data applying machine learning and predict which passenger survived the tragedy Link: https://www.kaggle.com/c/titanic
Right now we have implemented 9 Machine Learning models.
- RandomForest
- LinearSVC
- Stochasthic
- Gradient Descent
- Gaussian Naive Bayes
- K-Neighbors
- Perceptron
- DecisionTree
- Logistic Regression
Here are some insights we had analysing the Data.
- Pclass, Sex, Cabin and Embarked are Categorical features.
- Comparing Genders, Females are way more likely to survive.
- The fares didn't contribute much to the model
- We decided to unite Age and Pclass due to the correlation with results
- Names are unique in the dataset, so they are useless without preprocessing
- Dividing age feature by groups is important to improve Machine Learning performance.