Complete problem description can be found here: http://alt.qcri.org/semeval2014/task9/
Complete data is dowloaded from here: http://alt.qcri.org/semeval2017/task4/?id=download-the-full-training-data-for-semeval-2017-task-4
- prepare-data-csv.ipynb: Using raw data txt files create pandas dataframe
- Data cleaning and EDA.ipyb: Cleaning raw text and some Exploratory data analysis on our data
- Modelling.ipynb: CNN model for text classification task
Positive | Negative | Nuetral | |
---|---|---|---|
Total | 3640 | 1458 | 4586 |
Train | 2919 | 1166 | 3662 |
Test | 721 | 292 | 924 |
Model inspired from here: https://arxiv.org/abs/1610.08815
We've used GloVe embeddings trained on twitter dataset downloaded from here: https://nlp.stanford.edu/projects/glove/
Classification report:
Metric Plot:
Loss Plot:
We've achieved an F1-score of 0.6205 on test dataset