Real or Not? NLP with Disaster Tweets (Kaggle Challenge)
Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencie are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).
Main goal of this project is to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t.
Anaconda Python distribution was used to create the jupyter notebook for this project.There were no additional liabraries installed in support of this project.
The version of the notebook server is: 5.7.4
.
The version of Python us: Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
.
Following files are uploaded in the repository:
DS Capstone.ipynb: Contains all the analysis and modeling of the Boston and Seattle Airbnb datasets
train.csv
- the training settest.csv
- the test setsample_submission.csv
- a sample submission file in the correct format
Dataset is provided by Kaggle and can be found at below links:
https://www.kaggle.com/c/nlp-getting-started/data
Summary of data analysis and results can be found at below link on the medium portal:
This dataset was created by the company figure-eight and originally shared on their ‘Data For Everyone’ website. Kaggle hosted a challenge to develop machine learning models to classify tweets into real disaster or not.
Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.