Repo and blog post are works in progress.
Inspired by Tim Martin's Promoting Positive Climate Change Conversations via Twitter
Question: Given a large sample of tweets referencing "@realDonaldTrump," can we generate a graph network such that distinct communities, moderators, and influencers are identifiable?
Data Collection: Queried the Twitter streaming API for any tweets including @realDonaldTrump. Stored in mongoDB.
Data Analysis: Pyspark, Spark SQL, networkX, python-louvain (community) package, LDA topic modeling.
Conclusions:
- Communities are fairly clean: clear pro-Donald Trump and anti-Donald Trump communities, especially when Trump and his connections to other users are removed from the graph.
- Influencers: news organizations
- Moderators: also news organizations... and Ted Lieu?
Visualizations
- Check out package manager / code galaxies by Andrei Kashcha
- Web-hosted demo with current project data in works.
Machines used:
- Macbook Pro 13" (2015)
- Amazon c5.9xlarge EC2 instance (36 cores, 68 GB RAM) with Spark (single node)
March - April 2018