Disaster-Response-Classification

Description

This project is part of Data Scientist Nanodegree Program by Udacity in collaboration with Figure Eight. The initial dataset contains pre-labelled tweet and messages from real-life disaster. The aim of the project is to build a Natural Language Processing tool that categorize messages.

The project is divided in the following Sections:

Data Processing, ETL Pipeline to extract data from source, clean data and save them in a proper database structure
Machine Learning Pipeline to train a model able to classify text message in categories
Web App to show model results in real time.

Project Motivation

Following a disaster, there are a number of different problems that may arise. Different types of disaster response organizations take care of different parts of the disasters and observe messages to understand the needs of the situation. They have the least capacity to filter out messages during a large disaster, so predictive modeling can help classify different messages more efficiently. This is where a project like this can help. This will classify incoming messages into categories so that first responder and organisations can prioritize help to people who need it the most.

Getting Started

Dependencies

Python 3.7 (Anaconda preferred)
Machine Learning Libraries: NumPy, SciPy, Pandas, Sciki-Learn
Natural Language Process Libraries: NLTK
- NLTK specific libraries: punkt, wordnet, averaged_perceptron_tagger
SQLlite Database Libraqries: SQLalchemy
- You may also need to get SQLlite related driver files.
Web App and Data Visualization: Flask, Plotly

Installing

Clone this repository:

git clone git@github.com:agpt8/Disaster-Response-Classificatio.git

Go into this directory and install the dependencies

cd Disaster-Response-Classification

conda env create -f environment.yml

If you don't have conda installed, isntall the dependencies with the following command

pip install scikit-learn=0.19 numpy pandas flask plotly nltk scipy sqlalchemy

You need to install a specific version of scikit-learn (shown above) as later versions of the library throws a specific attribute error regarding deprecation script.

Executing Program:

Run the following commands in the project's root directory to set up your database and model (the order of the arguments is important).
- To run ETL pipeline that cleans data and stores in database
```
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
```
- To run ML pipeline that trains classifier and saves
```
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
```
- If you get an error during any of the above reagrding nltk module not found, refer this, to download the modules manually.
- The training could take several hours. On my PC, it took ~9 hours to train the model. If you just want to test this, feel free to use the included model.
Run the following command to start the web app. python app/run.py
- If during running the app, if you run into error saying model not found in joblib.load(model/classifier.pkl ), try using absolute path of the model here. Python some times fail to use relative paths. Also, if the absolute path has backslash "\", don't forget to escape it (\\).
Go to http://127.0.0.1:3001/

Acknowledgements

Udacity for providing such a complete Data Science Nanodegree Program
Figure Eight for providing messages dataset to train my model

Screenshots

Screenshot of the APP:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Disaster-Response-Classification

Table of Contents

Description

Project Motivation

Getting Started

Dependencies

Installing

Executing Program:

Acknowledgements

Screenshots

Files

README.md

Latest commit

History

README.md

File metadata and controls

Disaster-Response-Classification

Table of Contents

Description

Project Motivation

Getting Started

Dependencies

Installing

Executing Program:

Acknowledgements

Screenshots