Skip to content

ThirdspaceUofT/topic-modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic-Modeling

Topic modeling architectures on COVID-19 tweets.

Steps:

  • Tweet extraction from pymongo
  • Tweet cleaning
  • Run topic models

We experiment on different topic modeling architectures:

  • LDA
  • GSDMM
  • BTM
  • LDA2Vec
  • BERT
  • Twitter-LDA (Java implementation)
  • Twitter-LDA (Python implementation)

BTM, GSDMM performs quite well for short text corpus like tweets. We intend to experiment on other models and show topic-coherence from the ouptput.

Right now, Twitter-LDA is the winner. We have implemented both JAVA and Python version of the algorithm. We are workng on the performance metrics for evaluating the quality of our topic.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published