We have built a few fully working hate speech detection models. This project is currently in hibernation, in that there's not an active use case at the moment but the models are ready to go. Every once in a while we'll tweak the models a bit but in general, there is not active development on this project. If you know of a possible application, please reach out to us. Also, if you're interested in helping, we're always looking for help, whether more data, more models, or any other interesting component of hate speech detection. If you're interested in using or working on this model, feel free to reach out to the Slack channel (#p-hate-speech) or Julius Simonelli (jss367 in Slack).
We are currently working with the data collected by Davidson et al. for their research on hate speech detection. The dataset contains tweets that are labeled as either hate speech, offensive language, or neither. The data were pulled from Hatebase.org, an organization that collects instances of potential hate speech. The data were then labeled using CrowdFlower, which uses non-specialists to clean and label data. Each tweet was reviewed by three or more people and a majority-wins approach was taken when there was disagreement.
All data used in this analysis is stored in the data folder of this repository. The original source of the data is: https://github.com/t-davidson/hate-speech-and-offensive-language
The paper by Davidson et al. can be found here: Thomas Davidson, Dana Warmsley, Michael Macy, Ingmar Weber. 2017. "Automated Hate Speech Detection and the Problem of Offensive Language". Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM).
A REST API has been designed to demo the functionality of a basic model. The code will train a basic model as defined in the data_science.py and use this model to make predictions. To get the current IP address, ask in the Data for Democracy #p-hate-speech Slack channel.
Navigate to the hate_speech_detector/ directory and build the container:
docker build -t [container-name] ./app
When you run the container, you must also expose the port 8000. For example:
docker run -p 8000:8000 -t [container-name]
Labels are Hate = 0, Offensive = 1, Not Offensive = 2.
- /label
Then, the API can be called to predict the label on new text data via:
curl -H "Content-Type: application/json" -X POST -d '{"text":"Text that might be offensive or hateful... or not."}' http://0.0.0.0:8000/label
Output Ex:
{
"label": 2,
"text": "Text that might be offensive or hateful... or not."
}
In this case, "text" is the input text and label is the predicted label from the model.
- /demo
You may also see the model predict on held out test set values via:
curl http://0.0.0.0:8000/demo
Output Ex:
{
"label": 2,
"text": "#stateoftheunion would last 15mins if they let the President talk all that standing up clapping is for the birds",
"true": 2
}
Here, "text" is the text input, "label" is the predicted label from the model, and "true" is the actual label given by a human.
docker-compose -f docker-compose-prebuilt.yml up
At this point, if you set this up on AWS, then you could navigate to
[your aws ip]/demo
in your browser to see the demo.
There are currently two Jupyter Notebooks containing models to classify the data, but both could be greatly improved. Please feel free to take a look and let us know if you make any improvements!
There is currently very little preprocessing done on this data. Would someone be interested in creating some useful categories for machine learning and plugging them back into the models? My guess is feature engineering has the most potential to improve the model.
A front end for the demo app which can demo random elements of the test set or allow the user to input their own text.
The Davidson et al. paper remarked on some possible mislabelings in the dataset. Is mislabeling common in the dataset? Fixing any labels would definitely improve our ability to create a classifier. How big a problem is this? Does someone want to look at some of the misclassifications and see if any are incorrectly labeled?
Also, there's a second data source, also containing labeled hate speech from Twitter, but we are yet to explore it. You can find the data here: https://github.com/zeerakw/hatespeech. If anyone wants to look into this dataset and assess its value, it would be very useful. Something else to consider - can these datasets be combined?
What else do we want to do with this?