Disaster-Tweet-Classification

In this project, I had done binary classification on tweets into two following categories:

disaster tweets
non-disaster tweets

You can download the dataset from here : https://www.kaggle.com/c/nlp-getting-started/data
I have used Distill-bert model to get a word embeddings of the sentence tweets.
For that, I have used huggingface's transformer module, you can check here : https://huggingface.co/transformers/v2.0.0/model_doc/distilbert.html
In Distill_bert module, I have got the word embeddings of training and test dataset and stored them in file features_distill.npy and features_test.npy.
You can directly apply classification on those word embeddings using Predictions module.
In predictions module, I have performed classification on word embeddings using different algorithms: Neural network, Logistic regression, GaussianNB, RandomForest, SVC, Gradientboosting, decisiontrees etc. I got the best accurcy on validation data using Logistic Regression, which is 0.82.
Furthermore, you can also achieve state of the art accuracy using bert-large-uncased model instead of distill-bert, and by using single layer neural network and by retraining last bert output layer, but it take more compute power and training time.
I have used paperspace free gpu notebooks for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Distill_bert.ipynb		Distill_bert.ipynb
Predictions.ipynb		Predictions.ipynb
README.md		README.md
features_test.npy		features_test.npy
fetures_distill.npy		fetures_distill.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster-Tweet-Classification

About

Releases

Packages

Languages

Jd8111997/Disaster-Tweet-Classification

Folders and files

Latest commit

History

Repository files navigation

Disaster-Tweet-Classification

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages