Twitter Bot Detection

A deep learning-based project to detect spambots on Twitter using advanced NLP models such as Bi-LSTM, Bi-GRU, DistilBERT, DistilRoBERTa, and XLNet. This project was developed as part of my end-semester Deep Learning Lab course at Manipal Institute of Technology. It leverages the Cresci-2017 dataset to classify accounts as spambots or humans based purely on tweet content, avoiding reliance on user profiles or network structures.

Problem Statement

This project focuses on detecting spambots on Twitter by analyzing tweet content. It avoids reliance on handcrafted features or user profiles, providing an efficient and scalable solution to combat malicious online behavior.

Dataset

The Cresci-2017 dataset is used, featuring:

3,474 human accounts (~8M tweets)
1,455 spambot accounts (~3M tweets)

Exploratory Data Analysis

EDA revealed linguistic patterns distinguishing spambots from humans:

Spambots frequently use exaggerated language and external links.
Human tweets focus on personal interactions.

Word clouds and statistical summaries were used for insights.

Preprocessing Pipeline

Steps include:

Tokenization: Using NLTK and model-specific tokenizers.
Embedding: Pre-trained GloVe embeddings for word vectors.
Cleaning: Removal of special characters, URLs, and standardization.
Padding: Fixed-length sequences for model compatibility.

Model Architectures

The following models were implemented:

Bi-LSTM: Captures long-term dependencies in sequences.
Bi-GRU: Lightweight alternative to Bi-LSTM.
DistilBERT: Efficient transformer with ~97% of BERT's accuracy.
DistilRoBERTa: Robust contextual understanding with speed optimization.
XLNet: Bidirectional context through autoregressive pretraining.

Performance Metrics

Evaluation metrics include:

Precision, Recall, F1 Score
Accuracy
Matthews Correlation Coefficient (MCC)

Deployment

The final model was deployed using Streamlit, enabling real-time bot detection via a user-friendly web interface.

Results

Model	Training Accuracy	Testing Accuracy	Precision	Recall	F1 Score
Bi-LSTM	92.02%	92.22%	94.72%	89.02%	91.78%
Bi-GRU	91.52%	93.05%	94.97%	90.84%	92.86%
DistilBERT	98.18%	96.36%	98.57%	94.40%	96.44%
DistilRoBERTa	97.80%	96.34%	96.97%	95.74%	96.35%
XLNet	49.96%	50.00%	N/A	N/A	N/A

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DL_LAB_PROJECT_REPORT_PHASE-1,2,3,4.docx		DL_LAB_PROJECT_REPORT_PHASE-1,2,3,4.docx
README.md		README.md
app.py		app.py
twitter-bot-detection-distilbert-final (1).ipynb		twitter-bot-detection-distilbert-final (1).ipynb
twitter-bot-detection-distilroberta-final (1).ipynb		twitter-bot-detection-distilroberta-final (1).ipynb
twitter-bot-detection-lstm-gru-final.ipynb		twitter-bot-detection-lstm-gru-final.ipynb
twitter-bot-detection-xl-net-final (1).ipynb		twitter-bot-detection-xl-net-final (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Bot Detection

Table of Contents

Problem Statement

Dataset

Exploratory Data Analysis

Preprocessing Pipeline

Model Architectures

Performance Metrics

Deployment

Results

About

Releases

Packages

Languages

GitHubPro18/Twitter-Bot-Detection-DL

Folders and files

Latest commit

History

Repository files navigation

Twitter Bot Detection

Table of Contents

Problem Statement

Dataset

Exploratory Data Analysis

Preprocessing Pipeline

Model Architectures

Performance Metrics

Deployment

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages