Skip to content

A diagnostic analysis of Machine Learning topics from NIPS papers data using sklearn and gensim.

Notifications You must be signed in to change notification settings

gargimaheshwari/NLP-topic-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Topic Modelling with NIPS Papers

The NIPS conference (Neural Information Processing Systems) is one of the most prestigious yearly events in the machine learning community. At each NIPS conference, a large number of research papers are published. Over 50,000 PDF files were automatically downloaded and processed to obtain a dataset on various machine learning techniques. These papers discuss a wide variety of topics in machine learning, from neural networks to optimization methods and many more.

This is a diagnostic analysis of Machine Learning topics from NIPS papers data using LDA, GridSearch from sklearn, and CoherenceModel from gensim.

In the first notebook - sklearn, I use the Natural Language Processing (NLP) techniques, Latent Dirichlet allocation (LDA) and GridSearch from python's sklearn library to analyze machine learning topics in the NIPS papers dataset. I use the concepts of perplexity and log likelihood to find the optimal number of topics for the data. I then plot the various models that come out of the GridSearch fit against their perplexity to see which models perform best.

Most common words

In the second notebook - gensim, I again use LDA, this time with CorehenceModel from python's gensim library to analyze machine learning topics in the NIPS papers dataset. The concept of topic coherence measure is used to find the optimal number of topics for the data. Finally, I use python's LDA visualization tools to display the results of a model with the aforementioned number of topics.

LDA Visualization

Data taken from kaggle.com. Find the dataset here under papers.csv.

About

A diagnostic analysis of Machine Learning topics from NIPS papers data using sklearn and gensim.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published