Skip to content

NamanJain2050/semeval-2014-task-9

Repository files navigation

Text classification for SemEval 2014 task 9

Complete problem description can be found here: http://alt.qcri.org/semeval2014/task9/
Complete data is dowloaded from here: http://alt.qcri.org/semeval2017/task4/?id=download-the-full-training-data-for-semeval-2017-task-4

Description of each notebook:

  1. prepare-data-csv.ipynb: Using raw data txt files create pandas dataframe
  2. Data cleaning and EDA.ipyb: Cleaning raw text and some Exploratory data analysis on our data
  3. Modelling.ipynb: CNN model for text classification task

Dataset details:

Positive Negative Nuetral
Total 3640 1458 4586
Train 2919 1166 3662
Test 721 292 924

Model architecture for text classification task:

Model inspired from here: https://arxiv.org/abs/1610.08815

model_01


We've used GloVe embeddings trained on twitter dataset downloaded from here: https://nlp.stanford.edu/projects/glove/

Model Results

Classification report:

class_report


Metric Plot:

class_report


Loss Plot:

class_report


Conclusion

We've achieved an F1-score of 0.6205 on test dataset