CS-433 Project 2, 2019, Text Classification

Authors (team: Definitely not GRUs)

Louis Amaudruz
Andrej Janchevski
Timoté Vaucher

Abstract

In this project, we take a look at the binary classification of tweets. We need to predict whether the original tweet contained a positive or a negative emoji. To this end, we first use state-of-the-art data preprocessing, identify task-specific important features and devise four models: a classic ML baseline, a GRU model using GloVe embeddings and two transfer-learning models based on ULMfit and BERT respectively. The best classifier we found is the BERT model which yields a 0.904 accuracy and F-1 score on the test set in the competition.

For the reviewer

To run our final model for the evaluation, please proceed to the BERT model README to get the setup and information. If you wish to consult other models, please proceed to their corresponding folders.

Results

Link to the Competition leaderboard. Our team finished 2nd out of 37 participating teams / indivduals.

Model	Accuracy	F1-score
Classic ML	0.770	0.783
GloVe + GRUs	0.881	0.883
ULMfit	0.885	0.886
BERT (bert-base-uncased)	0.904	0.904

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS-433 Project 2, 2019, Text Classification

Authors (team: Definitely not GRUs)

Abstract

For the reviewer

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS-433 Project 2, 2019, Text Classification

Authors (team: Definitely not GRUs)

Abstract

For the reviewer

Results