I used a LSTM based Deep learning model to predict the 16 most popular programming languages of 2019. I considered the title and body of a question to make prediction. I used a three-layer LSTM network as LSTMs are so effective for the sequential dataset and are widely preferred in NLP domain.
I used StackSample dataset which is a collection of more than 1 million StackOverflow questions, answers and tags.
Python 3.7
Keras 2.3
Numpy 1.18
Pandas 1.0
Matplotlib 3.1
NLTK 3.4.5
Re 2.2.1
When tested on an unseen test set, I was able to achieve an accuracy of 82.34%.
Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing, Kamel Alrashedy