Skip to content

Text classification of Tweets as a project for CS-433 ML at EPFL, taken in Fall 2019, finished 2nd out of 37 teams

Notifications You must be signed in to change notification settings

tvaucher/CS433-ML-project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS-433 Project 2, 2019, Text Classification

Authors (team: Definitely not GRUs)

  • Louis Amaudruz
  • Andrej Janchevski
  • Timoté Vaucher

Abstract

In this project, we take a look at the binary classification of tweets. We need to predict whether the original tweet contained a positive or a negative emoji. To this end, we first use state-of-the-art data preprocessing, identify task-specific important features and devise four models: a classic ML baseline, a GRU model using GloVe embeddings and two transfer-learning models based on ULMfit and BERT respectively. The best classifier we found is the BERT model which yields a 0.904 accuracy and F-1 score on the test set in the competition.

For the reviewer

To run our final model for the evaluation, please proceed to the BERT model README to get the setup and information. If you wish to consult other models, please proceed to their corresponding folders.

Results

Link to the Competition leaderboard. Our team finished 2nd out of 37 participating teams / indivduals.

Model Accuracy F1-score
Classic ML 0.770 0.783
GloVe + GRUs 0.881 0.883
ULMfit 0.885 0.886
BERT (bert-base-uncased) 0.904 0.904

About

Text classification of Tweets as a project for CS-433 ML at EPFL, taken in Fall 2019, finished 2nd out of 37 teams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published