Welcome to the repository for Unraveling Viral News — a data-driven exploration into the factors that make news articles go viral across social media platforms. This project was developed as part of my final School assignment at Saint Louis University under the guidance of Professor Ravindranth Arunasalam.
With social media shaping the way we consume information, understanding why some articles attract massive engagement while others do not has become critical. This project aims to:
- Analyze platform-specific trends (Facebook, LinkedIn, GooglePlus).
- Explore the impact of sentiment, topics, hashtags, mentions, and sources on article engagement.
- Build predictive models to identify characteristics that make content viral.
- Sentiment Analysis: BERT-based sentiment scoring for titles and headlines.
- Engagement Metrics: Analysis of likes, shares, comments, hashtags, and mentions.
- Platform Trends: Comparison of engagement across Facebook, LinkedIn, and GooglePlus.
- Leveraged Latent Dirichlet Allocation (LDA) and clustering techniques to identify trending topics.
- Explored word frequency, bigrams, and trigrams to understand headline patterns.
- Built Random Forest and Polynomial Regression models to predict engagement.
- Evaluated feature importance to identify the most impactful article characteristics.
- Studied how headline characteristics and engagement trends evolved over time.
- Focused on key events like the 2016 elections.
- Which news article characteristics make them viral across different social media?
- Is there any topic that makes some news more viral than others across platforms and changes over time?
- How does user sentiment in using likes, shares, and comments predict the popularity of news articles?
- Articles with positive or neutral sentiment tend to perform better across platforms.
- Certain topics (e.g., "Obama," "Economy") consistently attract higher engagement.
- Hashtags and mentions significantly influence virality, with trending keywords amplifying reach.
- Headline patterns, such as the use of numbers ("Top 10") and emotional words, play a critical role in driving engagement.
/data
: Contains the dataset used for analysis./notebooks
: Jupyter notebooks for data exploration, analysis, and modeling./models
: Trained models and scripts for predictive analysis./visualizations
: Charts and plots generated during the project.
- Python: Core programming language.
- Pandas, NumPy: Data manipulation and analysis.
- Matplotlib, Seaborn: Data visualization.
- NLTK, Scikit-learn: NLP and machine learning.
- BERT: Sentiment analysis.
This project was guided and supervised by Professor Ravindranth Arunasalam, whose expertise and insights were invaluable throughout this journey.
- LinkedIn: https://www.linkedin.com/in/lasya-priya-k/
- Email: konduru.lasya@gmail.com
- GitHub: lasyakonduru
Contributions are welcome! If you’d like to enhance this project, feel free to fork the repository, submit issues, or create pull requests.
This README.md
file provides a comprehensive overview of my project, instructions for running it, and an invitation for collaboration.