- General info
- Project 1: SMS Spam detection
- Project 2: Fake news detection
- Project 3: Toxic comments detection
In this repository you'll learn how to classify text based on multiple models such as: Naive Baye, LSTM, Transformers (Bert) and One vs All
Using Naive Baye we have 98,56% accuracy ! (This is why companies use this algorithm for spams classification)
Using LSTM we have 97.72% accuracy, just like Naive Baye.
And using BERT we have 97.61% accuracy.
Using Naive Baye we have 62.92% accuracy and 60.00% of True news are misclassified as Fake news...
Using LSTM we have 67.16% accuracy and 80.90% of True news are misclassified as Fake news...
Using BERT we have 65.20% accuracy, the algorithm doesn't learn much.
For this last project I am using the dataset from kaggle: Jigsaw Unintended Bias in Toxicity Classification
Using Naive Baye we have:
- 86.95% accuracy for normal data
- 33.39% accuracy for severe toxic data
- 19.23% accuracy for obscene data
- 35.06% accuracy for insults
For a total of 72.74% Accuracy.
This shows the limits of Naive Baye algorithm.
Using LSTM we have:
- 91.18% accuracy for normal data
- 34.59% accuracy for severe toxic data
- 48.65% accuracy for obscene data
- 46.97% accuracy for insults
For a total of 76.78% Accuracy.
This is a great improvement from Naive Bayes.
Using BERT we have 74% accuracy in total.