Skip to content

Jd8111997/Disaster-Tweet-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster-Tweet-Classification

In this project, I had done binary classification on tweets into two following categories:

  1. disaster tweets
  2. non-disaster tweets

You can download the dataset from here : https://www.kaggle.com/c/nlp-getting-started/data
I have used Distill-bert model to get a word embeddings of the sentence tweets.
For that, I have used huggingface's transformer module, you can check here : https://huggingface.co/transformers/v2.0.0/model_doc/distilbert.html
In Distill_bert module, I have got the word embeddings of training and test dataset and stored them in file features_distill.npy and features_test.npy.
You can directly apply classification on those word embeddings using Predictions module.
In predictions module, I have performed classification on word embeddings using different algorithms: Neural network, Logistic regression, GaussianNB, RandomForest, SVC, Gradientboosting, decisiontrees etc. I got the best accurcy on validation data using Logistic Regression, which is 0.82.
Furthermore, you can also achieve state of the art accuracy using bert-large-uncased model instead of distill-bert, and by using single layer neural network and by retraining last bert output layer, but it take more compute power and training time.
I have used paperspace free gpu notebooks for this project.

Releases

No releases published

Packages

No packages published