This is a training project with the aim of analysing disaster data from Figure Eight and creating a model that classifies disaster messages. The project consists of 3 stages (ETL Pipeline, ML Pipeline, and web application) that upload and clean the initial data, classify it according to the task and then upload as an app.
The code contained in this repository was written in HTML and Python 3, and requires the following Python packages: json, plotly, pandas, nltk, flask, sklearn, sqlalchemy, sys, numpy, re, pickle, warnings
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
Actions: load and parse datasets, cleanse the data and store the data in a SQLite database.
Actions: load the data from SQLite database, splits the data into training and test sets, build a text processing and clssification pipeline, trains and tunes a model using GridSearchCV and exports the final model as a pickle file.
Actions: display the visualization (the app accept messages from users and returns classification results for 36 categories of disaster events).
The used datasets are very unbalanced, with very few positive examples for some message categories. This results in a low recall rate despite having high accuracy.
This app must not be used for actual pridiction unless more data is collected.
Additionally, the model training time can be improved.
This app was developed as part of the Udacity Data Scientist Nanodegree.