Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
images		images
Data cleaning and EDA.ipynb		Data cleaning and EDA.ipynb
Modelling.ipynb		Modelling.ipynb
README.md		README.md
emo_unicode.py		emo_unicode.py
prepare-data.ipynb		prepare-data.ipynb

Repository files navigation

Sarcasm Detection in Reddit Comments

This repository showcases my work on sarcasm detection task.
Problem description: Given raw comments from Reddit, we have to classify them as sarcastic or not.
Dataset source: https://www.kaggle.com/danofer/sarcasm
Paper referred: https://arxiv.org/abs/1610.08815

Description of each notebook

prepare-data-csv.ipynb: Using raw data source to create usable data CSVs
Data cleaning and EDA.ipyb: Cleaning text and some Exploratory data analysis on our data
Modelling.ipynb: CNN model for text classification task

Dataset details

We have a prepared a perfectly balanced dataset for our task.

	Sarcastic (1)	Not sarcastic (0)
Train	400000	400000
CV	50000	50000
Test	50000	50000
Total	500000	500000

Modelling

We've used 1D CNN models to extract features from raw texts and make classifications. We've used combinations of three different kinds of features:

Content features from raw texts
Sentiment features using Transfer Learning. Model trained on twitter dataset. More information can be found here: https://github.com/NamanJain2050/semeval-2014-task-9/
Emotion features using Transfer Learning. Two models trained on two different datasets. More information can be found here: https://github.com/NamanJain2050/emotion-detection

Model 1: Using only content features

Predictions made using only content features extracted from 1D CNN. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7234 and we were able to classify 73.58% of sarcastic comments correcly.

Model 2: Using content features + sentiment features

Predictions made using content features extracted from 1D CNN and sentiment features from pre-trained model. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7179 and we were able to classify 71.78% of sarcastic comments correcly.

Model 3: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7242 and we were able to classify 71.75% of sarcastic comments correcly.

Model 4: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. This time we'll use a different model trained for emotion features. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7235 and we were able to classify 72.07% of sarcastic comments correcly.

Model 5: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7222 and we were able to classify 70.71% of sarcastic comments correcly.

Model 6: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7215 and we were able to classify 72.1% of sarcastic comments correcly.

Summary of results

Conclusions

We've seen that adding emotion and sentiment features from pretrained models have degraded our results.
Possible reason(s):

Models were trained on much smaller datasets as compared to our SARC dataset

About

Saracasm Detection in Reddit comments

nlp deep-learning text-classification cnn sarcasm-detection sarc

Report repository

Releases

No releases published

Packages

No packages published

Languages