Sarcasam-Detection-for-Tweets

Main purpouse of this is to detect Sarcasam in tweets streaming from Twitter.

First comes data collection and in this case I am lucky enough as the data is already readily available and Now main task was to do good pre-processing Preprocessing: Actually the normal preprocessing is to remove all the hash-tags and retweets and words with @ and so on but a preliminary examination at the data gave me some insights such as in tweets with #not is more than 99.6% probable that it is a sarcastic tweet and same is the case with some other hash tags like #sarcasam and some other so i thought that one feature for classification can be if there are is a hash_tag which is present in a list which is already there with me than I will have ‘one’ in the as value of feature element otherwise it will be zero. Note : Frequency analysis on hash_tags clearly showed us that out of 51300 sarcastic tweets there are 9885 tweets that have #not and #sarcasm in 4251 tweets and so on.. I remove duplicates and we remove tweets whose length is less than 3 and also tweets which are not in English and removing all the @ and punctuation removing unicode and so on... Feature Engineering: First Feature is n-gram mainly unigram and bigram, For this each tweet is tokenized and stemmed. Sentiment Score by textblob : Explain all the hypothesis used and considered. (First very positive and then negative) . 3 parts and 2 parts Topics : First we need to learn the topics so that classifier will be learning topics associated with sarcastic tweets and so on.. Python library gensim which implements topic modeling using latentDirchiletallocation. First all tweets are feed to topic modeler and then each tweet can be decomposed as sum of topics which we use as features. SVM is the classifier used

I have followed http://www.thesarcasmdetector.com/ for this project and I need to say that this has cleared some basic concepts in text analytics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Camra-ready paper for book contribustion.pdf		Camra-ready paper for book contribustion.pdf
README.md		README.md
Sarcastic_Detector.pdf		Sarcastic_Detector.pdf
SentiWordNet_3.0.0_20130122.txt		SentiWordNet_3.0.0_20130122.txt
classif.p		classif.p
exp_replace.py		exp_replace.py
exp_replace.pyc		exp_replace.pyc
feature_extract.py		feature_extract.py
feature_extract.pyc		feature_extract.pyc
file		file
hash_tag_counts.py		hash_tag_counts.py
load_sent.py		load_sent.py
load_sent.pyc		load_sent.pyc
non_sarcastic_tweets.csv		non_sarcastic_tweets.csv
non_sarcastic_tweets_clean.csv		non_sarcastic_tweets_clean.csv
output.py		output.py
preprocessing.py		preprocessing.py
sarcastic_detector.py		sarcastic_detector.py
sarcastic_hash_tag		sarcastic_hash_tag
sarcastic_tweets.csv		sarcastic_tweets.csv
sarcastic_tweets_clean.csv		sarcastic_tweets_clean.csv
sarcastictweets_marked_with_hashtags.csv		sarcastictweets_marked_with_hashtags.csv
separate_sarcastic_non_sarcastic_tweets.py		separate_sarcastic_non_sarcastic_tweets.py
test		test
test.py		test.py
test_MLWARE1.csv		test_MLWARE1.csv
topic.py		topic.py
topic.pyc		topic.pyc
topics_dict.tp		topics_dict.tp
train_MLWARE1.csv		train_MLWARE1.csv
traintest.py		traintest.py
vecdict.p		vecdict.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sarcasam-Detection-for-Tweets

About

Releases

Packages

Languages

vamsilnm/Sarcasam-Detection-for-Tweets

Folders and files

Latest commit

History

Repository files navigation

Sarcasam-Detection-for-Tweets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages