Code for analyzing hate speech using knowledge embeddings and BERT representation

This is code I wrote during my internship at Spoken Language Systems, Saarland University. I worked on hate speech detection, and the dataset used was the OffensEval hate speech corpus released in SemEval 2019. I took some references and code from these repos/websites:

OffensEval Code

Doc2Vec training

BERTweet

The steps involved are roughly this, if you wish to reproduce the code:

Place dataset (in csv/tsv format) in the /Data folder.
Run GenerateBERT.py to generate BERT embeddings and save them in /pickles folder.
Run either of the three files in /Entity Extraction to generate entities/noun phrases from tweets. (The code in Stanford.py needs to be added to more.)
Run TrainDoc2Vec.py to train two Doc2Vec models based on Wiki corpus for Wikipedia embeddings. (Contact me for pretrained model files.)
Run EntityEmbeddings.py to generate embeddings from extracted entities using trained doc2vec models.
Finally, run either of TrainSVM.py and TrainRNN.py to train different models and see the outcome.

Some tips:

Adjust the paths for saving and loading files everywhere.
In the code I took from the links above, I have made significant edits. Feel free to remove them/add more to them to further investigate the system.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
Data		Data
Entity Extraction		Entity Extraction
Miscellaneous		Miscellaneous
DataReader.py		DataReader.py
EntityEmbeddings.py		EntityEmbeddings.py
FeedForward.py		FeedForward.py
GenerateBERT.py		GenerateBERT.py
README.md		README.md
TrainDoc2vec.py		TrainDoc2vec.py
TrainRNN.py		TrainRNN.py
TrainSVM.py		TrainSVM.py
TweetNormalizer.py		TweetNormalizer.py
check.sh		check.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code for analyzing hate speech using knowledge embeddings and BERT representation

About

Uh oh!

Releases

Packages

Languages

anjalibhavan/Hate-Speech-Analysis-Using-World-Knowledge

Folders and files

Latest commit

History

Repository files navigation

Code for analyzing hate speech using knowledge embeddings and BERT representation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages