Topic modeling architectures on COVID-19 tweets.
Steps:
- Tweet extraction from pymongo
- Tweet cleaning
- Run topic models
We experiment on different topic modeling architectures:
- LDA
- GSDMM
- BTM
- LDA2Vec
- BERT
- Twitter-LDA (Java implementation)
- Twitter-LDA (Python implementation)
BTM, GSDMM performs quite well for short text corpus like tweets. We intend to experiment on other models and show topic-coherence from the ouptput.