Cyberbullying detection on comments and messages shared over social media application dataset obtained from Kaggle is performed using the Python Notebook uploaded in the repository.
The dataset has the following fetures:
- 2000 Datapoints
- Datapoints contain unfiltered comments from Social platforms containing punctuation, emoticons and user tags etc.
- Cross platform Dataset
Machine Learning Techniques applied for the purpose of bigamous classification of comments as "bullying" or "non-bullying" are:
- K-Nearest Neighbors
- Naive Bayes Classifier
- Logistic Regression
- Support Vector Classifier
-
Dataset Collection: Collected Dataset from Kaggle
https://www.kaggle.com/datasets/syedabbasraza/suspicious-communication-on-social-platforms
-
Conversion of CSV file to Dataframe using Pandas
-
Preprocessing of Comments using
1. Conversion of text to lower-case 2. Removal of Punctuations 3. Removal of non-alphabetic words (words containing numerics/punctuations) 4. Removal of stopwords (English) 5. POS Tagging 6. Lemmatization 7. Removal of words with length less than or equal to 1
-
Application Of Algorithms
- K-Nearest Neighbors:
https://medium.com/@draj0718/k-nearest-neighbor-knn-using-python-d0a6bb295e7d
- Naive Bayes Classifier:
https://medium.com/@piyumipremathilake/na%C3%AFve-bayes-algorithm-3f5b78f32b1c
- Logistic Regression:
https://www.ibm.com/topics/logistic-regression
- Support Vector Classifier:
https://www.geeksforgeeks.org/classifying-data-using-support-vector-machinessvms-in-python/
-
Generation of Results
In the form of a Metric Table as give below:
Detection of Cyberbullying using Bullying Features and Machine Learning.pdf
For project collaboration or any discussion, Email Me at yashasvi488@live.com.