Skip to content

Mrezvan94/Harassment-Corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Harassment-Corpus

Publishing a Quality Context-aware Annotated Corpus andLexicon for Harassment Research.

Identifying profane or offensive words are a standard way of starting the investigation over cyberbullying incident. For this reason, initially we created a lexicon form the profane words and we divided our dictionary into the six context;1) Sexual 2) Appearance-related 3) Intellectual 4) Political 5) Racial 6) Combined. We utilized the first five categories of our lexiconas seed terms for collecting tweets from Twitter. Using at least one offensive word,we collected 10,000 tweets for each contextual type for a total of 50,000. Using offensive words in a given tweet does not assure that thetweet is harassing because individuals might utilize the offensivewords in a friendly manner or quotes. Therefore, we rely on human judged annotations for discriminating harassing tweets fromnot-harassing tweets. We acknowledge support from the National Science Foundation (NSF) award CNS 1513721: Context-Aware Harassment Detection on Social Media. Wiki page of this project: http://wiki.knoesis.org/index.php/Context-Aware_Harassment_Detection_on_Social_Media To getting our annotated tweets in five context, please contact the authors via these emails: Mohammadreza Rezvan: mohammadrezarezvan94@gmail.com Saeedeh Shekarpour: sshekarpour1@udayton.edu

About

Harassment Lexicon and Corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published