GitHub - dchen236/toxic_comment: Final Project for CS263

toxic_comment detection using Kaggle Dataset

In this project, we compare the performance of machine learning and deep learning mod- els, including Naive Bayes, Logistic Regres- sion, Decision Tree, LSTM, CNN, and BERT on a toxic comment detection challenge from Kaggle. The main goal of this project is to help mitigate content moderation. Therefore, After training the models, we analyze popular online and social media platforms, including Reddit and Twitter, by detecting toxic content using the trained model. We further show the vulnerability of the models using adversarial examples.

Scripts

NOTE: if you have trouble loading the scripts, click reload after the first fail.

Training

script: non_nn_models.ipynb for trainning non neural networks including Naive Bayes, Logistic Regression and Decision Tree
script: BERT_Toxic.ipynb for trainning BERT
script: kaggle_benchmark.py for trainning CNN
script: BERT_CNN.ipynb for trainning BERT-CNN
script: LSTM.ipynb for traininig LSTM
script: BERT_tSNE.ipynb for t-SNE visualization

Social / Online Platform Analysis

covid19-tweet-eda.ipynb covid19 related tweets EDA

Adversarial Vulnerability Attack

script of adversarial attack with Perspective API.

Team Member:

Danni Chen
Fuzail Khan
Don Le

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
BERT_CNN.ipynb		BERT_CNN.ipynb
BERT_Toxic.ipynb		BERT_Toxic.ipynb
BERT_tSNE.ipynb		BERT_tSNE.ipynb
Dataloader.py		Dataloader.py
LSTM.ipynb		LSTM.ipynb
LogisticRegression(sci_kit).ipynb		LogisticRegression(sci_kit).ipynb
Perspective_API.ipynb		Perspective_API.ipynb
README.md		README.md
config.py		config.py
covid19-tweet-eda.ipynb		covid19-tweet-eda.ipynb
datasets.py		datasets.py
kaggle_benchmark.py		kaggle_benchmark.py
metrics.py		metrics.py
models.py		models.py
non_nn_models.ipynb		non_nn_models.ipynb
predict.py		predict.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toxic_comment detection using Kaggle Dataset

Scripts

Training

Social / Online Platform Analysis

Adversarial Vulnerability Attack

About

Releases

Packages

Contributors 2

Languages

dchen236/toxic_comment

Folders and files

Latest commit

History

Repository files navigation

toxic_comment detection using Kaggle Dataset

Scripts

Training

Social / Online Platform Analysis

Adversarial Vulnerability Attack

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages